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Abstract 


A  random  mapping  is  a  random  graph  where  every  vertex  has  outdegree  one.  Previous 
work  was  concerned  mostly  with  a  uniform  probability  distribution  on  these  mappings. 
In  contrast,  this  investigation  assumes  a  non-uniform  model,  where  different  mappings 
have  different  probabilities. 

An  important  application  is  the  analysis  of  a  factorization  heuristic  due  to  Pollard 
and  Brent.  The  model  involved  is  a  random  mapping  where  every  vertex  has  indegree 
either  0  or  d.  This  distribution  belongs  to  a  class  called  permutation  invariant.  A  study 
of  the  general  properties  of  permutation  invariant  mappings  combined  with  the  analysis 
of  this  particular  distribution  made  possible  the  computation  of  the  expected  running 
time  of  this  factorization  method,  settling  an  open  conjecture  of  Pollard.  ' 
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Introduction 


The  main  objective  of  this  dissertation  is  the  study  of  finite  random  functions  under 
the  assumption  that  different  functions  might  have  different  probabilities. 

There  exists  already  a  relatively  large  literature  about  random  functions;  more 
than  thirty  Journal  articles  in  the  last  twenty  five  years  have  been  inspired  by  var¬ 
ious  problems  in  statistics,  probability  theory,  and  biomathematics.  In  computer 
science,  random  mappings  appear  in  the  analysis  of  hashing  algorithms,  random 
number  generators,  cryptography  problems,  and  integer  factorisation  algorithms. 
Most  of  the  published  results  concern  the  uniform  probability  distribution  on  the 
space  of  random  functions;  a  uniform  model  is  inadequate  for  the  precise  study  of 
certain  algorithms,  which  motivated  the  present  work. 

The  first  chapter  introduces  a  certain  type  of  probability  distribution  on  the 
space  of  random  mappings,  called  permutation  invariant,  that  turns  out  to  be  a 
key  concept  in  the  analysis  that  follows.  Using  combinatorial  methods  it  is  shown 
that  for  permutation  invariant  distributions  the  random  variables  of  mterest  are 
related  in  simple  and  unexpected  ways,  and  it  is  enough  to  know  the  probability 
generating  function  for  one  of  the  variables  to  compute  the  probability  generating 
functions  for  the  others. 


'X' 


These  results  are  applied  in  the  second  chapter  to  the  uniform  distribution 
model  to  obtain  in  a  simple  and  consistent  manner  most  of  the  previously  known 
results  as  well  as  some  new  or  sharper  formulae.  Through  the  use  of  novel  Abel 
t3rpe  identities  and  the  algebra  of  Q-series,  all  the  results  are  expressed  in  terms 
of  the  Q'fuiictions,  thus  explaining  some  of  the  “mysteriously"  similar  asymptotic 
behavior  already  noted  in  the  literature.  The  last  sections  of  chapter  two  do  not 


\ 
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Chapttr  t 

Permutation  invariant  mappings 


1.1.  Introduction  and  notations 


A  mapping  is  a  function  from  a  set  into  itself.  The  graph  of  a  mapping  /  is  a 
directed  graph  where  every  node  has  outdegree  one;  it  consbts  of  a  collection  of 
directed  trees  with  their  roots  linked  into  directed  loops.  The  elements  that  are 
part  of  a  loop  are  called  cyclic  or  recurrent.  (That  is,  an  element  x  is  recurrent 
iff  there  exists  j  >  0  such  that  />(x)  =  x.)  The  restriction  of  /  to  the  recurrent 
elements  is  clearly  a  permutation  of  those  elements. 

The  set  of  n*  functions  n)  n}  is  denoted  F{n).  To  each 

/  €  J’lnj  we  associate  a  probability  weight  w(/),  such  that  tff(/)  =  1. 

A  probability  weight  w  is  called  permutation  invariant  if  for  any  permutation 
P  of  {l»  •  •  •»*»}•  end  for  any  function  /,  we  have  uf(/)  =  tw(po/),  where  the  notation 
pof  is  defined  by  (po/)(x)  =  /(p(x)).  An  equivalent  definition  is  that  a  probability 
weight  is  permutation  invariant  if  any  two  fimctions  that  have  as  their  image  the 
same  multiset,  are  equally  likely.  Indeed,  consider  two  functions  /  and  g  such  that 
/({!,. ..,n})  =p({l,...,n})  as  multisets.  Then  there  exists  a  permutation  p  such 
that  g  ss  pof.) 

As  an  example  of  a  permutation  invariant  distribution  consider  the  weight 
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defined,  for  a  fixed  set  ^4  C  {!,..., 


n}  of  size  k,  by  the  rule 
if  /(z)  G  y4  for  1  <  X  <  n; 
otherwise. 


This  weight  is  permutation  invariant  because  for  any  permutation  p,  of  n}, 

if  /(i)  e  >l  for  1  <  X  <  n,  then  /(p(x))  €  i4  for  1  <  z  <  n.  More  gener¬ 
ally,  if  (u;j,...,tx;n)  are  any  weights  that  sum  to  1,  the  weight  function  ti/(/)  = 
ujy(i)  ...Wf^n)  is  permutation  invziriant. 


As  an  example  of  a  w'eight  that  is  not  permutation  invariant  consider 
_  (  1/n!,  if  /(i)  <  X  for  1  <  X  <  n; 

\  0,  otherwise. 


Let  /  be  a  function  in  /’[n].  Define  a  sequence  x,>i  =  /(x^).  This  sequence  ‘is 
ultimately  periodic  for  every  Xq,  and  there  exbt  numbers  X  and  p,  which  depend 
on  lo,  such  that  xq, . .  .,Xft+x-i  are  distinct,  but  Xj+x  =  x»i  for  i  >  p.  The  number 
^(io»  /)  is  called  the  cycle  length,  and  the  number  p(xo,/)  is  called  the  tail  length. 
The  cycle  length  b  always  positive,  but  the  tail  length  can  be  0.  In  fact,  xo  b 
cyclic  if  and  only  if  p(xo,/)  =  0. 

Having  fixed  a  probability  weight  on  F[n]  we  can  define  the  following  two 
random  variables:  p(x)  =  the  length  of  the  tail  starting  from  a  certain  element  x 
and  A(x)  =  the  length  of  the  period  starting  from  x.  We  abo  define  the  random 
variables  A  and  p  that  represent  the  length  of  the  tail,  and  the  length  of  the  period, 
when  /  is  chosen  in  F[n]  according  to  the  probability  weight  w,  and  the  starting 
point  b  chosen  unifortoly  at  random  in  the  set  n}.  Hence  in  thb  case  the 

probability  space  b  F|n)  x  n},  while  in  the  former  case  it  was  F(n)  only. 

To  avoid  confusion,  we  shall  use  the  notation  Pr(Ar)  to  mean  the  probability  of 
the  event  X  when  the  probability  space  b  F(n]  and  the  notation  ^(Af)  when  the 
probability  space  b  F|n]  x  n}.  Similar  conventions  apply  to  E,  vaf,  and 

cof. 


From  thb  definition  it  follows  that 

Pi(A  =  ‘)  =  i  E  Pr(iW=*).  (1.1) 

x€(l,...,n} 

and  similarly 

pj(p  =  fc)  =  ^  Pr(p(x)  =  *).  (1.2) 

fl  - 

Yet  another  random  variable  of  interest  b  r,  the  total  number  of  cyclic  ele¬ 
ments.  (The  probability  space  for  r  b  always  F(n].)  The  nice  thing  about  permu¬ 
tation  invariant  weights  b  that  there  exbt  simple  relations  between  the  probability 
dbtributions  of  A,  p,  and  r.  We  shall  explore  these  relationships  in  the  next  sec¬ 
tions. 


/\ 
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1.2.  The  distribution  of  A  and  n 


Lemma  1.  Given  s  permutAticn  invariant  probability  weight  w  on  JFln),  for  any 
Exed  starting  point  i  e  {1, . . . ,  n} 

Pr(A(i)  =  i  and  m(z)  =  j  |  x  is  not  cyclic) 

=  Pr(A{i)  =  j  and  m(x)  =  i  |  x  is  not  cyclic). 

Proof:  The  idea  of  the  proof  is  to  show  a  1-1  correspondence  between  mappings 
where  A(i)  =  i  and  p{x)  =  j  and  mappings  where  A{x)  =  J  and  p{x)  =  i. 

Consider  a  mapping  /  such  that  A(x,/)  =  i  and  p(x,f)  =  j.  (Because  x  is 
not  cyclic,  j  must  be  strictly  positive.)  Consider  another  mapping  g,  identical 
to  /  everywhere  except  for  the  points  x  and  p{x)  where  g(x)  =  f(fHx))  and 
9{fHx))  =  fix).  (See  Figure  1.) 


Figrire  1.1.  Two  corresponding  mappings. 

It  Lb  clear  that  A(x,y)  =  j  and  ft(x,p)  =  t.  By  construction  ti>(g)  =  to(/)»  because  tv 
is  permutation  invariant,  and  g  =  (x,/'  (x))  o  /.  Furthermore  the  correspondence 
f  g]a  one  to  one,  and  the  desired  result  is  obtained  by  summing  the  probabilities 
of  all  /’s  with  A(x,/)  =  i  and  pix,f)  =  j.  | 


/ 
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Corollary  2.  Given  a  permutation  invariant  probability  weight  w  on  F{n],  for 
any  starting  point  i  €  {1, . . . ,  n} 

Pr(M(i)  =  »  1 1  is  not  cyclic)  =  Pr(A(z)  =  »  1 1  is  not  cyclic). 


I 

Corollary  3.  Given  a  permutation  invariant  probability  weight  w  on  F[n],  for 
any  starting  point  i  G  {li  •  •  •  i  n},  and  for  any  i  >  0 

Pt{p{x)  =  i)  =  Pr(A(x)  =  »)  -  Pr(A(z)  =  i  and  x  is  cyclic). 


Proof:  Because  •  b  positive 

Pr(/x{z)  =  i)  =  Pr(/i(x)  =  *  and  x  b  not  cyclic); 

Using  the  Bayes  rule  and  Corollary  2,  we  obtain 

Pr(/x(x)  =  0  =  Pr(/i(i)  =  I  and  x  b  not  cyclic) 

=  Pr(z  b  not  cyclic)  Pr(;t(z)  =  1 1  z  b  not  cyclic) 
=  Pr(z  b  not  cyclic)  Pr(A(z)  =  t  |  z  b  not  cyclic) 
=  Pr(A(i)  =  I  and  i  b  not  cyclic). 


For  several  of  the  following  proofs  it  is  convenient  to  define  an  equivalence 
relation  on  F(n]  as  follows.  Two  mappings  /,s  €  F[nj  will  be  called  simiVar  if  they 
have  the  same  set  of  cyclic  elements,  A  C  {1, . . . ,  n},  and  /(z)  =  g{x)  for  all  z  ^  ^4. 
That  means  that  the  set  of  mappings  similar  to  /  b  obtained  by  composing  an 
arbitrary  permutation  of  the  cyclic  elements  of  /  with  /  itself.  Clearly  similarity 
b  an  equivalence  relation,  and  the  set  of  ail  the  equivalence  classes  under  it  will  be 
denoted  F[n]. 

Observe  that 

•  If  /  belongs  to  some  class  E  €  iSrln]  and  /  has  k  cyclic  elements  then  |JSr  |  =  k\, 
and  hence  the  only  possible  values  for  the  cardinality  of  an  arbitrary  equiva¬ 
lence  class  are  1!  ,2! , ... .  (The  number  of  equivalence  classes  of  size  A;!  will  be 
discussed  in  Section  6.) 

•  If  /  and  g  belong  to  the  same  equivalence  class  and  the  weight  w  b  permutation 
invariant,  then  tt;(/}  =  w(;). 
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Lemma  4.  Given  a  permutation  invariant  probability  weight  w  on  F[n)  and  an 
integer  1  <  i  <  n,  if  x  is  chosen  uniformly  at  random,  then 

Pr(A(i)  =  i  and  x  is  cyclic)  =  —  ^Pr(r  =  k). 

”  *>. 


Proof:  Fix  one  similarity  equivalence  class;  all  the  mappings  in  it  have  the  same 
cyclic  elements.  Let  k  be  the  number  of  these  cyclic  elements.  The  probability 
that  X  is  among  them  is  k/n,  and  the  probability  that  the  cycle  containing  x  has 
length  I  <  A:  is  exactly  1/k  because  all  the  permutations  of  the  cyclic  elements  are 
equally  likely  (see  {Knuth73a,  ex.  1.3.3-17]).  Summing  on  all  possible  sets  of  k 
cyclic  elements  and  on  all  k  >i  completes  the  proof;  more  formally  we  have 


Pr(A(x)  =  i  and  x  is  cyclic) 

~n^  S  I  =  t  and  X  cyclic  in  /}) 

/€f(n|  l<*<n 

S  2  S  u>({/|A(x,/)=i  and  xcyclicin/}) 


»<*<«  |£;i=fcl  leB  1<I<H 

k  1 


»<lk<fi  IBlskl 


i<k<n 


A  somewhat  surprising  property  of  the  permutation  invariant  mappings  follows 
directly  from  the  above  lemma: 


Corollary  5.  For  any  permutation  invariant  probability  weight  w  on  P(n|  the 
expected  number  of  fixed  points  is  1. 

Proof:  From  Lemma  4,  if  x  is  chosen  uniformly  at  random  then 

Pr(A(x)  =  1  and  x  is  cyclic)  =  ^  Pr(r  =  Jk)  =  i, 

since  every  function  has  at  least  one  cyclic  element.  On  the  other  hand,  if  x  is 
chosen  uniformly  at  random  then 

Pr(A(x)  =  1  and  x  is  cyclic)  =  i  ^  Pr(A(x)  =  1  and  x  is  cyclic) 

x€{l....,n} 

=  —  E(number  of  fixed  points). 
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Lemma  6.  Given  a  permutation  invariant  probability  weight  w  on  /'[n|,  for  any 
fixed  point  x  and  any  Sxed  integer  1  <  i  <  n, 

Pr(X(T)  =  0  =  x; 

*>< 


Proof:  Consider  a  certain  mapping  /  such  that  y  is  the  first  cyclic  point  reached 
from  X.  Within  the  equivalence  class  of  /,  all  permutations  induced  by  the  cyclic 
points  are  equally  likely,  and  therefore  the  length  of  the  cycle  containing  y  is  uni¬ 
formly  distributed  between  1  and  k.  The  proof  is  completed  by  summing  over  all 
the  equivalence  classes,  weighted  by  their  probability,  as  in  the  proof  of  Lemma  4. 
B 


Corollary  7.  Given  a  permutation  invariant  probability  weight  w  on  Fin),  for  all 
i  >  0, 


rr(M  =  0  =  Pr(A  =  0  -  E  =  *)  f  i 

k>i  ^  V* 


Proof:  By  Corollary  3  and  Lemma  6,  we  have 

Pr(p(x)  =  t)  =  Pr(A{i)  =  i)  —  Pr(A(x)  =  i  and  x  is  cyclic), 

for  any  fixed  x,  and  therefore  also  for  x  chosen  imiformly  at  random,  in  which  case 
we  can  also  apply  Lemma  4.  | 


We  are  now  ready  to  derive  the  relations  between  the  distribution  of  the  num¬ 
ber  of  cyclic  points,  r,  and  the  distribution  of  A  and  p. 


Theorem  8.  Let  the  probability  generating  functions  for  A,  p,  and  r  be  L{z), 
M{z),  and  C(x),  respectively.  Given  any  permutation  invariant  probability  weight 
w  on  P{n],  these  functions  satisfy 


and 


MW 


C*(l) 

n 


+  L(z) 


4cm  - 1) 

n(i-l) 


/ 
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Proof:  By  definition  and  Lemma  6  applied  to  a  starting  point  chosen  uniformly  at 
random, 

tw  =  i:pr(A  =  .y  =  E^^^  E 

i  *>1  1<*<* 

.  Pr(r  =  t)  »*  -  1 
^  k  z-l' 

k>i 


On  the  other  hand 


h  *  tx 


= jk)t*  • 

k 


1  *>i 


which  proves  the  first  part  of  the  theorem. 

For  the  second  part  we  use  Corollary  7,  from  which  we  obtain 

EP'r(M=.y=tw-E*T^^^  =  iW-E^^^^^  E 


t>l  k>i 


l<i<k 


=  iW  -  ^(7^  =  *)(^‘  -  »  =  iW  - 


The  probability  that  |i  =  0  is  the  same  as  the  probability  of  choosing  a  cyclic 
element, 

^r{n  =  0)  =  ^Pr{r  =  k)^  = 

Combining  the  last  two  equations  we  get 


In  a  similar  manner,  it  is  possible  to  obtain  expressions  for  C(z)  and  £(s)  in 
terms  of  M{z)t  or  for  M{z)  and  C{z)  in  terms  of  L{z). 

As  a  quick  check,  note  that  by  L’Hospital’s  rule 


lim  L{z)  =  lim  — —  =  C(l)  =  1, 


I— ♦!  Z 


and  also 


lim  M W  =  ic'(l)  +  Ito  L W  -  i  nm  C\z)  =  X. 


Using  Theorem  8  we  can  easily  compute  the  means  of  A  and  ti  in  terms  of  the 
moments  of  r,  but  it  is  more  fun  to  prove  their  relations  directly. 
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ThGorcm  9.  Given  a  permutation  invariant  probability  weight  w  on  /'[n] 

E(A)  = 

Proof:  Choose  a  fixed  starting  point  x.  Let  y  be  the  firrt  cyclic  point  reached 
from  I.  By  the  argument  used  in  the  proof  of  Lemma  6,  the  expected  length  of 
the  cycle  containing  y  averaged  over  all  the  mappings  with  k  cyclic  elements  is 
{k  -f  l)/2.  Hence 

E(A(x))  = 

a 


Theorem  10.  Given  a  permutation  invariant  probability  weight  w  on  F(n| 

EW  =  E(A)-i(EW  +  E(r»)). 

Proof:  If  the  starting  point  is  not  cyclic  the  expected  value  of  A  and  p  are  the 
same  (Corollaury  2);  therefore  to  obtain  the  mean  value  of  n  we  must  subtract  from 
the  mean  value  of  A  the  contribution  of  the  cyclic  elements,  which  have  =  0. 
Assuming  k  cyclic  elements,  their  total  contribution  is  k{k  +  l)/2,  so  that  the 
contribution  per  element  is  k{k  +  l)/{2n)  and  the  claim  follows.  | 


To  obtain  the  same  relations  from  Theorem  8,  we  compute 

m—1  far  — 


;  “X  {z  -  1)2 

^  ).^^C(x)  +  (x-l)C-(x)-C(x)/x 
*-►1  2(z  —  1) 


(1)  +  Cjl) 

2 


and  similarly 


=  i'(l)  -  -  lim  -  »)g'U)  +  -  g'M 

'  '  n  .—1  2(x  -  1) 

=  t'(l)-^(C"(l)+2C'(I)). 
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We  can  obtain  higher  order  moments  in  the  same  manner  but  it  is  more  con¬ 
venient  to  use  the  approach  described  in  the  next  section. 

1.3.  Higher  order  moments 

Given  a  probability  distribution  G{z)  =  its  Ith  factorial  moment,  is 

defined  by 

Jt=l  * 

Ov\r  goal  in  this  section  is  to  express  the  higher  order  factorial  moments  of  A  and 
H  using  the  higher  order  factorial  moments  of  r,  namely  CfeJfc-,  where 

Ck  =  Pr(r  =  *). 

The  following  discussion  b  simplified  if  we  use  the  notation 

_  G(')  1  fk\ 

‘  ll 

By  Taylor’s  theorem,  if  all  the  moments  of  G  exist,  then 


—  Gq  -|-  wGi  +  w^G^  "h 


From  Theorem  8  we  have 


x,(„ + 1) = ;L±i  r  ‘  £(a^ = r  £(i±ii 

V)  J I  t  V)  Jq  t  +  1 

—  f  (C'o  +  Cit  +  Cjt^  +  •  •  •)  (l  ~  t  "h  t*  +  •  •  *)  dt 

1 1 

=  (C'o  +  (Ci-Co)«  +  (C2-Ci-HCo)t*  +  *--)dt 

W  Jq 

=  (ti;  +  l)(Co  +  ^(Ci  —  Cq)w  +  “(Cj  —  Cl  +  Cq)w^  +  •••). 

£  u 


Hence  for  /  >  0 


0<»<J  0<»<1-1 
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In  particular 


which  we  already  know  (Theorem  9),  and 

I,"(l)  =  l(c"(l)  +  C'(l)-1). 

Some  more  formula  crunching,  better  left  to  a  computer,  leads  to 

=  i"(l)  +  i'(I)  -  (l'(l))* 

=  5Var(r)  +  iE(r)»-l. 

For  the  moments  of  ft,  first  concentrate  on 

z{C{2)  -  l) 


(1.6) 


(1.7) 


(1.8) 


A{z)^ 


n{z-l) 


(1.9) 


We  have 


A{w  +  1)  =  t  ^  +  Catu  +  Caw’  +  •••),  (1.10) 

hence 


At  = 


Ct  +  C|+i 


(1.11) 


On  the  other  hand 


so  that  finally 


In  particular 


MW  =  ^  +  £W-XW. 


and 


Mi  =  Li-  ^L±^l±i,  1  >  0. 

n 

(1.12) 

M'(l)  =  L'(l)  -  £1(1). 

(1.13) 

(1.14) 
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1.4.  The  distribution  of  the  number  of  cycles 

Another  interesting  characteristic  of  a  mapping  is  the  number  of  components,  or 
equivalently,  the  number  of  cycles.  This  random  variable  (over  F(n])  b  denoted 
and  its  probability  generating  function  b  denoted  B[z).  Wc  shall  see  below  that 
the  distribution  of  0  is  closely  related  to  the  distribution  of  r.  I 

I 

Theorem  11.  Given  a  permutation  invariant  probability  weight  w  on  jP(n],  the 
probability  distribution  of  the  number  of  components  satisfies  j 


Pr(/)  =  y)  =  E[*]inP'(^  =  ‘)- 


Proof:  Fix  k  cyclic  elements.  All  their  permutations  are  equally  likely,  hence  the 
probability  of  their  forming  j  cycles  b  exactly  [*]  /kl,  where  [*]  b  a  signless  Stirling 
number  of  the  first  kind*  (See,  for  example,  (Knuth73a,  §1.2.10].)  g 

Corollary  12.  Given  a  permutation  invariant  probability  weight  w  on  F{n],  the 
probability  of  a  function  being  connected  is 

=  :)  =  5;  ££(Lii)  =  /*£(£),,. 

t  *  Jo  ^ 


From  Theorem  11,  the  probability  generating  function  of  0  satbfies 

On  the  other  hand  the  exponential  generating  function  for  Stirling  numbers  of  the 
first  kind  b 

and  therefore,  using  the  Hadamard  product, 

?  L*]  s =, 12k /?<'(')  (>•>«) 


^  With  this  notation: 
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where  the  integral  is  on  a  circle  around  the  origin. 
Combining  equations  (15)  and  (16),  we  obtain 


(1.17) 


To  compute  the  moments  of  0  we  differentiate  equation  (17)  with  respect  to 
2,  at  z  =  1.  We  obtain 


Now  we  use  the  expansion 


where  is  an  r-Stirling  numbers  of  the  first  kind.^  (see  [Broder84a]).  Setting 

2  =  1/t  and  r  =  1  we  get 

t-l\  t-lj  ^&![/  +  lJi^  ■ 


For  I  >  0,  j  ] ,  and  we  obtain 


oo. 


(1.18) 


For  the  first  moments  it  might  be  preferable  to  use  the  identities  ([Zave78]) 


^  The  r-Stirling  numbew  of  the  first  kind,  [J]^,  count  the  number  of  permutations  of  k 
elements,  with  I  cycles,  such  that  the  elements  1,2, ...,r  are  in  distinct  cycles.  The  r*Stirling 
numbers  of  the  second  kind  are  discussed  in  section  2.1.1. 
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and 

where  are  (generalized)  Harmonic  nimibere,  Hi,  -  and  = 

From  these  identities 

=  (1.19) 

k 

and 

E(W-1))  =  ^c„(HI-H<’>).  (1.20) 

k 

V/e  have  seen  so  far  that  the  distributions  of  several  important  random  vari¬ 
ables  associated  with  a  permutation  invariant  probability  distribution  on  F[n],  are 
determined  by  the  distribution  of  r,  the  number  of  cyclic  elements.  The  next  two 
sections  show  how  this  distribution  can  be  replaced  by  the  dbtribution  of  a  simpler 
entity. 


1.5.  The  Foata-Fuchs  encoding  of  mappings 

Let  E  =  (1, 2, . . . ,  n}  be  an  alphabet.  Using  the  terminology  of  Comtet  (Comtet74|, 
to  each  word  w  over  E  we  associate  its  Abelian  images  T  (w)  obtained  by  replacing 
each  occurrence  of  the  letter  I  in  w  by  the  commutative  variable  *,•.  For  instance 
7(1  232)  =  xix\xz.  The  enumerotor  of  a  multiset  of  words  0  C  E*  is  defined  to 
be  the  polynomial 

^(0)  =  E  ’■(w). 

w€n 

Clearly  the  coefficient  of  i‘i‘ x’j* . . .  in  f  (fl)  represents  the  number  of  words  in 
n  that  contain  exactly  »i  occurrences  of  I,  iz  occurrences  of  2,  and  so  on. 

Sometimes  only  some  letters  are  of  interest;  in  this  case  the  variables  corre¬ 
sponding  to  other  letters  are  assigned  the  value  1,  to  obtain  the  enumerator  by 
the  number  of  occurrences  of  the  distinguished  letters.  This  is  the  same  as  the 
enumerator  of  the  multiset  obtained  from  fl  by  deleting  from  each  word  the  undis¬ 
tinguished  letters.  For  instance  the  enumerator  of  E"  is  (xi  -I-  xj  +  •  •  •  -|-  x„)";  the 
enumerator  of  E”  by  the  number  of  occurrences  of  I  and  3  is  (xi  -h  xs  n  —  2)'‘. 

An  encoding  is  a  1-1  correspondence  between  the  n"  distinct  mappings  from 
{l,2,...,n}  into  {1,2, ... ,n}  and  the  n**  distinct  words  of  length  n  over  the  al¬ 
phabet  E  =  {l,2, ...,n}.  Let  /  be  a  mapping.  The  trivial  encoding  of  /  is  given 
by 


/-./(l)/(2).../(n); 
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that  is,  the  word  associated  to  /  is  obtained  by  concatenating  the  letters  cor¬ 
responding  to  the  values  taken  by  /  in  1,2, The  trivial  encoding  of  the 
mapping  in  Figure  2  is 

43T5l563l6l394T6716957. 

The  trivial  encoding  of  a  mapping  /  is  denoted  Co(/). 

6 


Figure  1.2.  A  mapping  on  16  elements. 

As  described  below,  the  Foata-Fuchs  encoding  (FF-encoding)  [FF70)  of  a  map¬ 
ping  /  is  the  concatenation  of  /  -I- 1  words,  where  /  is  the  number 

of  leaves  (nodes  with  indegree  0)  in  the  graph  of  /.  The  word  uq  describes  the 
permutation  induced  by  the  cyclic  elements  and  is  generated  by  the  following  al¬ 
gorithm.  (After  each  step,  the  result  of  this  algorithm,  applied  to  the  mapping  in 
Figure  2,  b  shown  in  square  brackets.) 

Algorithm  A. 

1.  Write  the  permutation  as  a  product  of  cycles. 

((3,15,5,6)(9)(7,16)| 

2.  Reverse  each  cycle. 

((6,5,15,3)(9)(16,7)] 

3.  Rotate  each  cycle  such  that  the  maximum  element  in  each  cycle  b  in  the  first 
position.  The  element  in  the  first  position  of  a  cycle  b  called  the  cycle  leader. 

((15,3,6,5)(9)(16,7)] 

4.  Put  the  cycles  in  increasing  order  of  their  cycle  leader. 


MSSErsKiBSsr^'"! 
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((9)(15,3,6,5)(16,7)] 

5.  Remove  paurenthesca  and  replace  each  number  by  ita  corresponding  letter. 

{9  Is  3  6  5  ie  7] 


0 


The  result  is  actually  another  permutation  of  the  letters  of  Co{f).  It  is  clear 
that  the  transformation  is  1-1  because  the  algorithm  can  be  reversed.  The  end 
of  a  cycle  is  “signalled"  by  a  new  left-to-right  maximum.  This  transformation  is 
implied  in  (RiordanSS,  Chap.  8)  and  formalized  and  generalized  in  |Foata65|.  It 
was  later  referred  to  as  “the  first  fundamental  transformation  for  permutations” 
[FoataSS]. 

Now  let  oi, 03, . . . ,  a{,  Oi  <  03  <  •  •  •  <  oi  be  the  leaves  in  the  graph  of  /.  The 
word  w,-  consists  of  the  labels  of  the  nodes  on  the  path  from  o,-  to  the  first  node 
already  appearing  in  wq,  . . .  including  this  node  and  excluding  o,-,  in  reverse 

order.  For  the  mapping  in  Fi^re  2  we  have  I  =  6,  oi  =  1,  03  =  2,  03  =  8,  04  =  11, 
05  =  12,  Oq  =  14,  and  ui  =  15  4,  W3  =  3,  wj  =  16  13,  (t/4  =  4  10,  W5  =  7,  —  9. 

Therefore  the  complete  encoding  of  the  function  in  Figure  2  is 

9T5365l67l543l6l34i079. 


It  is  clear  from  its  definition  that  the  FF-encoding  is  a  permutation  of  the 
trivial  encoding.  It  is  a  1-1  correspondence  because  it  can  be  reversed.  Given  a 
word  w  of  len^h  n,  the  letters  corresponding  to  at, 03,..., at  are  exactly  those 
letters  among  T,...,n  that  do  not  appear  in  w,  sorted  in  increasing  order.  The 
subword  u/q  ends  before  the  first  repeated  letter,  or  wq  =  w  if  no  letter  is  repeated; 
the  beginning  of  each  subword  describing  a  path  b  “signalled*  by  a  repeated  letter; 
and  so  on.  Ebcactly  /  letters  are  repeated  since  /  letters  are  left  out. 

Given  a  mapping  /  its  FF-encoding  will  be  denoted  Ci{f).  A  set  of  mappings 
F  encodes  into  a  set  of  words,  denoted  Ci(F'). 

Examples 

1.  Let  F  be  the  family  of  functions  /  ;  n>  -♦  {l,...,n}  such  that  the 

graph  of  /  b  connected  and  n  b  cyclic.  Then 

^{Ci(F’))  =  Xm(zi  +  Z3  -h  •••  +  x*)""*, 

because  any  /  €  F*  has  exactly  one  cycle,  and  n  must  be  in  it.  Therefore  Ci{f) 
starts  with  n  followed  by  an  arbitrary  word  of  length  n  —  1. 
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2.  Let  F  be  the  family  of  functions  /  ;  {1, . . .  ,n}  -♦  {1, . . .  ,n}  such  that  n  is  the 
unique  cyclic  element.  (The  graph  of  every  /  is  a  tree  rooted  at  n.)  Then 

=  3:„r„(ii  +  i'j  +  •••  + 

Making 

Xi  =  I2  =  •••  =  Zn  =  1, 

we  find  that  the  number  of  undirected  trees  on  n  labelled  vertices  b  n""*. 
(Cayley’s  theorem) 

1.6.  The  distribution  of  the  repetition  index 

The  repetition  index  t/  of  a  mapping  /  is  the  maximum  number  t  such  that  the 
values  /(l),/(2),...,/(»)  are  all  distinct.  For  example,  the  mapping  in  Figure  2 
has  1/  =  3.  Our  interest  in  the  repetition  index  is  motivated  by  the  next  theorem. 

Theorem  13.  For  any  permutation  invariant  probability  weight  w  on  F[n),  the 
probability  distributions  of  the  total  number  of  cyclic  elements  and  of  the  repetition 
index  are  equal,  that  is  for  any  k 

Pr(r  =  ik)  =  Pr(j/  =  ik). 

Proof:  Let  /  be  a  mapping  with  r(/)  =  ik.  If  we  look  at  the  FF-encoding  of  / 
as  the  trivial  encoding  of  another  mapping  g  (i.e.  Co(9)  =  Ci(/)),  then  g  has 
the  property  that  w{g)  =  v){f)  because  w  is  permutation  invariant.  Furthermore 
i/^g)  sz  k  by  construction.  Summing  over  all  /  with  r(/)  =  k  completes  the  proof. 

I 

Theorem  13  is  the  basic  tool  for  the  computation  of  all  the  probability  distri* 
butions  implied  by  a  certain  permutation  invariant  weight  because  in  most  cases  it 
is  easy  to  determine  the  distribution  of  u,  using  simple  string  counting  arguments. 

Using  the  FF>encoding  we  can  count  the  number  of  similarity  equivalence 
classes  introduced  in  Section  2. 

Theorem  14.  The  number  of  similarity  equivalence  classes  of  size  k!  is 
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&nd  the  total  number  of  equh^Icnce  classes  is 

|£;[nl|  =  (n  +  l)'*-‘. 

Proof:  Consider  two  mappings,  /  and  g,  belonging  to  the  same  equivalence  class, 
E.  Assume  that  |fJ|  =  k\.  (Hence  /  and  g  have  k  cyclic  elements.)  Under  this 
assumption,  if  Ci{f)  =  aiaj.-.a^  and  ^1(9)  =  6162. we  must  have  ay  =  fey 
for  ail  j  greater  than  k\  ai02  . . .  o*  must  be  all  distinct  and  must  be  a  permutation 
of  feife2  .  ..fefc;  and  c*+i  must  be  equal  to  one  of  the.  letters  01,02,. ..  ,a«;.  It  follows 
that  any  equivalence  class  with  A:!  elements  can  be  represented  by  a  set  of  k  distinct 
letters  and  a  string  of  n  -  A;  arbitrary  letters,  such  that  the  first  letter  of  the  string 
is  in  the  set.  Therefore  the  number  of  equivalence  classes  of  size  A:!  is  (]J)A:n’*“*“*. 
The  total  number  of  equivalence  classes  is 


which  equals  (n  +  I)""*  because  of  the  identity 

which  b  obtjuned  by  taking  the  derivative  of  (i  +  y)"  with  respect  to  z.  B 


1.7.  The  distribution  of  p 


In  certain  cases  we  are  interested  in  the  dbtribution  of  p{x,f)  =  A(z,/)  + 
that  b,  the  sum  of  the  length  of  the  tail  and  the  length  of  the  period  starting  from 
X  in  the  mapping  /.  Another  interpretation  for  p(z,  /)  b  to  see  it  as  the  number  of 
elements  reachable  from  z,  or  the  number  of  "descendants”  of  z,  in  the  graph  of  /. 
By  analogQT  with  A  and  n,  the  random  variable  p(z)  represents  the  value  of  p(z,  /) 
when  z  b  fixed  and  /  b  chosen  in  F[n]  according  to  the  probability  weight  w,  and 
the  random  variable  p  represents  the  value  of  p(z)  when  z  b  chosen  uniformly  at 
random  in  n}. 

We  know  the  expected  values  of  A  and  of  p,  hence  the  expected  value  of  p  b  not 
hard  to  find.  But  its  higher  order  moments  present  more  difficulties.  Fortunately, 
we  can  relate  the  dbtribution  of  p  directly  to  the  dbtribution  of  the  total  number 
of  cyclic  elements,  r,  with  the  help  of  yet  another  encoding  of  mappings. 
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Let  /  G  F[n]  be  a  mapping,  and  let  x  be  some  fixed  element  in  n}. 

The  encoding  Ci{x,J)  is  a  string  of  length  n  +  1  over  the  alphabet  {1,2, . . .  ,n}  of 
the  form _ _ _ 

*/(i)  •••  /‘(x)  /(a,)  ...  /(On-i)  , 

where  /*{i)  is  the  first  repeated  element  in  the  sequence  x,/(x),/*{i),... ,  possibly 
X  itself,  and  ai,...,an-(  are  the  elements  in  {l,...,n}  that  do  not  belong  to  this 
sequence,  written  in  increasing  order.  The  index  »  is  in  fact  p(x, /).  For  example, 
for  the  function  in  Figure  2  we  have 

C2(l,/)  =  T  4  Is  5  6  3  l5  3  Te  Is  9  4  To  7  16  9  7, 

Cals,/)  =  563l5543l5l6l394l07l69  7, 

CalQ,/)  =  9943l5l563i6l34l07l695  7. 

Lemma  15.  Given  a  permu^a^■on  invaiiant  probability  weight  w  on  F(n),  if 

Calx,/)  =x  Colff) 

then 

Pr(/)=PT(g). 

Proof:  Clearly,  Calx,/)  consists  of  x  followed  by  a  permutation  of  the  trivial  en¬ 
coding  off.  The  premise  of  the  Lemma  means  that  the  trivial  encoding  of  g  is  a 
permutation  of  the  trivial  encoding  of  /;  hence  /  and  g  have  the  same  probability 
because  tv  is  permutation  invariant.  | 

Theorem  16.  Given  a  permutation  invariant  probability  weight  w  on  F[n], 

Prlp  =  A:)  =  ^*‘1’'  =  *  “  1)  +  ~  ^  ^)* 

Proof:  We  have 

Prlp  =  A;)  =  Pr(plx)  =  k  and  x  is  not  cyclic)  +  Pr(plx)  =  k  and  z  b  cyclic). 

The  latter  term  b  just  Pr(Alz)  =  k  and  z  b  cyclic),  which  equab  Prlr  >  k)/n 
by  Lemma  4.  The  former  term  b  1/n  times  the  total  weight  of  all  mappings  whose 
Ca-«ncoding  has  the  form  2Co1p)i  where  p  b  a  mapping  such  that  u[g)  =  A;  —  1 
and  z  b  in  but  it  b  not  one  of  the  dbtinct  elements  p11)»...»p1A;  —  1). 

Hence  we  have 

=  ik)  =  !Lli±lpr(i,  =  Jfe  _  1)  +  Iprlr  >  Jk), 

fl  ft 

and  the  theorem  follows  because  the  dbtributions  of  u  and  r  are  identical  ITheorem 
13).  I 
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As  a  quick  check  we  can  compute 


5^Pr(p  =  *)  =  Pr(r  =  -  i)  +  i  J^^PrCr  > 

k  k  ^  ^  ^  k 

=  E(l-^)P^(^  =  *)  +  ;E(r)  =  l 


1=) 


and 


EW  =  E*  (>  -  ^)  =  *- 1)  +  ;  E‘Ej’-(^  =  •) 

=  E(*  +  *)(»-s)p''('  =  ‘)  +  ;Ep'’(^  =  *)  E  * 

*  '  '  •  ><*<• 


=  E  ((r  +  1)  (l  -  i))  +  i  E  (ife±i>)  =  E(r)  +  1  - 
which  is  indeed  E(A)  +  E(/i). 

The  probability  generating  function  for  p,  i2(2),  b  given  by 

n  «(*-!) 

For  the  higher  order  factorial  moments  of  p  first  note  that 
R{z)^zC{z)-^C>{z)  +  A{z), 


E(r)  +  E(r3) 
2n 


(1.21) 


where  A{z)  is  given  by  equation  (9),  so  that  it  is  enough  to  consider  the  derivatives 
of 

D{z)  =  zC{z)  -  —C\z). 

fl 

We  obtain  that 

D{w  +  1)  =  (u;  +  l)C(u;  +  1)  -  +  1) 

Tl 

=  {w  +  l)(Co  +  Ciw  +  C2W^  +  •••)-  +  2Caty  +  ••.); 
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hence 

Di  -  Ct  +  Ct-i  - 

From  equation  11 


(/  +  l)C,.n+2/C,  +  (/-l)C|-i 
n 


Ai  = 


Cl  +  Ci+i 


n 


so  that 


Ri 


=  (i  -  c,.,  +  (i  -  c,  - 


In  particular 


fi'(l)  =  I  +  (l  -  i)  C'(l)  -  ic''(l). 

as  expected  (equations  (6)  and  (13)),  and 

R'^il)  =  (2  -  ^)  <?'(!)  +  (1  -  !) 


(1.22) 

(1.23) 


(1.24) 


1.8.  The  covariance  of  A  and  n 

By  definition,  the  covariance  of  A  and  n  is  given  by 

coV(A,M)  =  E(A/i)-E(A)E(M), 


and  their  correlation  is 


cor(A,/i) 


cov(A,Ai) 
\/vif(A)  var(|t) 


It  seems  that  A  and  n  must  be  negatively  correlated  for  permutation  invariant 
weights;  if  we  pick  a  certain  z  and  a  certain  /,  and  if  turns  out  that  /i(z,  /)  is 
larger  than  average,  we  expect  that  A(z,/)  will  be  smadler  than  average  because 
for  each  value  of  p,  if  w  is  permutation  invariant,  then  A  and  ft  are  almost  identi¬ 
cally  distributed.  However  there  are  permutation  invariant  weights  such  that  the 
correlation  of  A  and  u  is  positive;  an  example  is  presented  in  Appendix  B. 

Our  goal  in  this  section  is  to  find  the  value  of  cov(A,p)  as  a  function  of  the 
moments  of  r. 

We  start  from 

y^{p)  =  E(p’  -  E(p)’)  =  E(A’  +ft^  +  2Xfi)  -  E(A)^  -  E(/i)*  -  2E(A)  E(p) 

=  faf(A)  +  ^3f(p)  +  25o^(A,p). 
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Replacing  the  variance  by  its  expression  in  ternis  of  derivatives  of  the  probability 
generating  function,  we  obtain 

R"(l)  +  i2'(l)  -  R'(1)2  =  L"(l)  +  L'{1)  - 

+Ar(l)  +  M'{1)  -  A/'(l)^  +  2cav{X, fi). 

But  /2*(1)  =  Xf'(l)  +  M'(l),  and  hence 


and  finally 


/2"(1)  -  2L'(1)M'{1)  =  L"(l)  +  A/"(l)  +  2cov(A,m), 


=  r  (/2"(1)  -  i"(l)  -  Af"(l)  -  2L'(1) Af'(l)).  (1.25) 


After  expressing  ail  the  factorial  moments  in  terms  of  the  factorial  moments 
of  the  total  number  of  cyclic  elements,  r,  (equations  (5),  (12),  and  (22)),  we  obtain 
(with  the  help  of  a  computer)  that 


(1.26) 


1.9.  A  simple  example  -  permutations 

In  this  section,  as  a  quick  check,  we  shall  examine  a  very  simple  permutation  in¬ 
variant  weight.  More  intricate  problems  will  be  discussed  in  the  following  chapters. 

Assume  that  all  permutations  are  equally  likely  and  all  other  mappings  have 
probability  0.  More  precisely,  the  probability  weight  is  defined  by 


)(f)  =  (  if  /  is  a  p< 

\  0,  otherwise. 


permutation; 


This  weight  is  clearly  permutation  invariant. 

All  the  elements  are  always  cyclic,  hence 

C(z)  =  s"  E(r)  =  n  E(r^)  =  var(r)  =  0. 
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From  Theorems  9  and  10,  we  obtain  that 


E(A)  = 


n  +  1 
2  ’ 


and 


EM  =  ^-:i(n  +  n>)  =  0, 


as  expected.  Equation  (8)  results  in 


•— /w  n*-l 
var(X)  =  -JJ-, 


and  equation  (26)  confirms  that  cov(A,/x)  =  0.  Theorem  8  gives 

=^5:  A 


*>i 


and  (not  too  surprisingly) 


n{z  —  1)  n{z  —  1) 


For  the  number  of  cycles,  0  we  obtain,  from  equation  (15),  that  the  probability 
generating  function  b 

'  Fnl 


z>. 


Equations  (19)  and  (20)  translate  into 

E(/?)  =  Hn 


and 

a  slightly  less  known  fact. 

Corollary  5  b  the  somewhat  amazing  truth,  that  no  matter  how  many  men 
will  mix  up  their  hats,^  on  average  only  one  of  them  will  get  hb  hat  back. 


3 


It  !•  not  my  intention  to  be  cexiit,  but  women  never  mix  their  hate. 
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1.10.  Other  types  of  invariance 

A  weight  function  w,  was  called  permutation  invariant,  if  for  any  permutation  p 
and  any  mapping  /,  we  have  xv{f)  =  w{po  /).  In  a  similar  manner  we  say  that  a 
weight  function  w  is  labelling  invariant  if  for  any  permutation  p  and  any  mapping 
/,  we  have  w{f)  =  w{f  o  p). 

Any  mapping  /  induces  an  equivalence  relation  over  its  domain,  dehned  by 
I  =  y  iff  /(i)  =  /(y).  A  weight  function  is  labelling  invariant  if  any  two  mappings 
that  induce  the  same  equivalence  relation  are  equally  likely. 

A  weight  function  iv  is  isomorphic  invariant  if  for  any  permutation  p  and  any 
mapping  /,  we  have  tv(f)  =  w(p~^  °  f  °  p)-  In  other  words,  a  weight  function  is 
isomorphic  invariant  if  any  two  mappings  that  have  the  same  (unlabelled)  graph 
are  equally  likely. 

Theorem  17.  Any  weight  function,  w,  is  equivalent  to  a  isomorphic  invariant 
weight  function,  w',  in  the  sense  that  w  and  w*  imply  the  same  distribution  for  p, 
\,  p,  and  T. 

Proof:  Define 

u;'=  o/op), 

p 

where  p  ranges  over  all  permutations  of  {1, . . .  ,n}.  Q 

This  theorem  simplifies  the  study  of  the  possible  probability  generating  func¬ 
tions  for  p,  X,  p,  and  r.  For  instance,  on  F[3],  the  most  general  probability  gen¬ 
erating  functions  for  these  quantities  depend  only  on  9  parameters  (which  must 
sum  to  1),  corresponding  to  the  probabilities  of  the  9  isomorphic  mappings  on  3 
elements,  and  not  on  27  parameters  corresponding  to  the  27  mappings  in  i^[3]. 
Such  a  study  shows  that  the  propositions  6,  7,  and  16  are  independent  in  the  sense 
that  for  any  subset  of  them  there  are  (non-invariant)  weight  functions  that  satisfy 
all  the  propositions  in  the  chosen  subset,  and  do  not  satisfy  the  other  propositions. 

The  three  types  of  invariance  defined  so  far  (permutation  invariant,  labelling 
invariant,  and  isomorphic  invariant)  are  clearly  independent,  becatue  there  are 
weight  functions  that  have  one  property,  but  do  not  have  the  other  two.  However 
if  any  two  of  the  invariance  properties  are  present,  all  three  hold.  For  instance, 
permutation  invariance  and  labelling  invariance  imply  isomorphic  invariance: 

<"(/)  =  o  /)  =  o  /  o  P), 

and  permutation  invariance  and  Isomorphic  invariance  imply  labelling  invariance: 
*^(/)  =  ^(pof)  =  u;(p“*  opo/op)  =  ti;(/op). 
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We  say  that  a  weight  function  is  strongly  invariant  if  it  is  both  labelling  in¬ 
variant  and  permutation  invariant.  By  Theorem  17  and  the  above  observation, 
any  permutation  invariant  weight  function  is  equivalent  (from  the  point  of  view  of 
the  distribution  of  A,  p,  and  r)  to  a  strongly  invariant  weight  function.  Simi¬ 
larly,  any  labelling  invariant  weight  is  equivalent  to  strongly  invariant  weight,  and 
therefore  all  the  relations  between  the  distributions  of  fi,  A,  p,  and  r,  that  hold  for 
permutation  invariant  weights  also  hold  for  labelling  invariant  weights. 

Let  A(/)  be  the  multiset  {ci(l),d(2),...,d(n)},  where  d{i)  is  the  number  of 
elements  in  {1,...  ,n}  where  /  takes  the  value  i  (that  is,  the  indegree  of  t  in  the 
graph  of  /).  With  this  definition,  a  weight  function  w  is  strongly  invariant  if  for 
any  two  mappings  /  and  g  that  satisfy  A{/)  =  A{g)  we  have  «;(/)  =  u;(y). 

Clearly  for  any  mapping  /,  we  must  have  53i<»<n  ‘^(0  ~  ^{/)  >3  just 

a  partition  of  n.  Therefore  a  strongly  invariant  \^iiht  is  completely  characterized 
by  associating  to  each  partition  of  n  a  certain  probability.  For  instance  if  n  =  3 
there  are  three  partitions:  [1,1,1],  [2,  l),  and  (3|.  Denoting  their  probabilities  by 
u;[l,  1, 1],  w[2,  l],  and  «;[3|,  it  follows  that  for  zmy  strongly  invarizmt  weight  on  F[3|, 
the  generating  function  C{z)  must  have  the  form 

C{z)  =  (w[3]  +  iio[2,  l])s  +  ^u;[2,  l]z^  +  ti;[l,  1,  l]z^. 


Chapter  2 

The  uniform  distribution  model 


The  obvious  permutation  invariant  weight  on  the  space  of  finite  functions,  F(n|,  is 
the  uniform  distribution.  A  considerable  number  of  results  are  known  about  this 
situation;  Appendix  A  contains  a  bibliography  on  random  mappings  that  lists  over 
twenty  relevant  papers.  The  main  results  were  obtained  by  Harris  (Harris60]  and 
Stepanov  [Stepanov69]. 

In  this  chapter  we  shall  use  the  concepts  of  the  first  chapter,  both  to  derive 
some  old  results  in  the  new  setting,  and  to  obtain  some  new  formulae.  The  last 
two  sections  do  not  make  use  of  the  permutation  invariant  property  of  the  uniform 
distribution,  but  share  with  the  first  chapter  the  use  of  combinatorics  on  words  as 
a  main  tool. 

2.1.  Preliminaries 

The  section  presents  some  mathematical  entities  that  will  be  used  later. 

2.1.1.  The  r-Gtirling  numbers 

Stirling  numbers  of  the  second  kind  are  denoted  by  they  are  defined  combina^ 
torially  as  the  number  of  partitions  of  the  set  {1, . . .  ,n}  into  m  non-empty  disjoint 
unlabelled  sets.  Good  expositions  of  their  properties  cam  be  found,  for  exaunple,  in 
[Comtet74],  [RiordanSSj,  or  [Jordam47]. 
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The  r-Stirling  numbers  of  the  second  kind,  count  certain  restricted 

partitions  and  are  defined,  for  all  integers  r  >  0,  as  the  number  of  partitions  of  the 
set  {1, . . . ,  n}  into  m  non-empty  disjoint  subsets,  such  that  the  numbers  1 , 2, . . . ,  r 
are  in  distinct  subsets. 

The  properties  of  the  r-Stirling  numbers  are  discussed  in  [Broder84a].  They 
were  also  studied  under  different  names  and  notations  in  [Nielsen23),  [CarlitzSO], 
and  (Koutras82].  Their  asymptotics  were  studied  in  detail  in  (IM65j.  Here  we  shall 
need  only  the  fact  that 


as  n  — ♦  00,  for  fixed  m  and  r.  (The  notation  xV.  means  i(i  —  2){x  —  4) . . .  .) 

The  r-Stirling  numbers  satisfy  a  recurrence  similar  to  the  recurrence  for  Stir¬ 
ling  numbers,  namely 


2.1.2.  The  Q-series 

Knuth  defines  the  infinite  Q-series  in  [KnuthSS]  as 

n- 

» ®2i  • .  •)  —  ^  ]  if®**  (^•^) 

For  any  given  sequence  ai,  oj,  ...,  this  function  depends  only  on  n.  In  partic¬ 
ular,  is  denoted  Q{n).  The  asymptotic  behavior  of  Q{n)  is  well 

understood  ([Ramanujanl2],  (Knuth73a,  §1.2.11.3]): 


The  Q-series  are  relevant  to  mzmy  problems  in  the  analysis  of  algorithms 
[KnuthSS],  for  instance  the  representation  of  equivalence  relations  [KS78],  hashing 
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[Knuth73b,  §6.4],  interleaved  memory  [KR75J,  labelled  tree  enumeration  [Moon7l|, 
optimal  caching  [KnuthSS],  permutations  m  situ  [StanfordSlj,  and  random  map¬ 
pings  [KnuthSl,  §3.1]. 

It  can  readily  be  shown  that  the  (J-series  satisfy  the  recurrence 

(5„(a,,2a2,3a3,...)  =  nQn(ai,a2  -  01,03  -  02,...).  (2.5) 


From  this  recurrence  it  follows  that 


Qn(lt  2, 3, . . .)  —  n; 

Q„(l^2^3^...)  =nQ(rt);  (2.6) 

Qn(1^2^3^...)  =2n*-nQ(n). 

In  general  there  exist  integral  and  positive  coefficients  such  that 


g^(im  2m  3m  )  ^  +9m.2Qlr-'>  -  •••  .  (2.7) 


where 

nl"’!  =  /  if  m  is  odd; 

”  1  n"'^^Q(n),  if  m  is  even 

The  leading  coefficient  has  a  simple  expression,  qm.o  =  (m-  1)!!  (see  (KnuthSSj  for 
details).  Consequently  for  s  fixed, 

g,(i’*,22',...)  =  ^  =  {2s  -  l)!!n'g(n)  -f-0(n*),  (2.8) 

fc>i 

and 

g„(l**+‘,2’*+‘,...)  =  =  (2s)!!n*+‘  -|-0(n*+‘/*).  (2.9) 

*>»  ” 


There  is  an  interesting  relation  between  g-series  and  r*Stirling  numbers:  For 
all  h  >  1  we  have 

and  in  particular  for  r  =  0  (i.e.,  regular  Stirling  numbers)  we  have 
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i3oth  this  equations  are  an  immediate  consequence  of  the  recurrences  (2)  and  (5). 
Another  noteworthy  particularization  of  equation  (5)  b 


-  flfc-i)  + 


(2.12) 


2.1.3.  Asymptotics  of  certain  Q-series 


In  this  section  we  present  a  fairly  general  method  for  obtaining  the  leading  term 
in  the  asymptotic  expansion  of  Q-series  of  the  form  f{k)n-/n^  where  /  is  a 

differentiable  function  on  the  interval  (l,oo),  and  wheri  /'(x)  =  0(i*)  for  some 
constant  a  and  all  i  6  {l,oo).  Clearly 

/W  =  /(i)  +  ^’ /'(()*, 

so  this  condition  implies  that 


if  a  >  —1; 

0{lnx),  ifa=-l;  (2.13) 

0(1),  if  o  <  —1. 


VVe  start  by  noticing  that 

n  n 

1<.<*  '  '  l<«ik 

and  hence  /(fc)n-/n*  is  exponentially  small  for  k  >  for  any  c  >  0. 

For  k  <  n*/^'*’*  we  can  use  the  Stirling  expansion  to  obtain 

•n  =  (n  +  5)  In'*  -  «  -  ^  +  a)  “  *) 

+  (n  ~  fc)  —  fc  In  n  +  0(n“  *) 

=  (n-l;+i)In;^-fc  +  0(n->). 

Expanding  the  logarithm  in  its  Taylor  series  we  get 


(2.14) 


(2.15) 


(2.16) 
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which  combined  with  equation  (15)  leads,  when  k  <  to 


(n  -  k)\n>^  -  2(n  -  ifc)  " 

(2.17) 

Therefore  it 

^  +  0(n-‘/^+’')),  k  <  (2.18) 


These  results  help  us  to  prove  the  following 

Lemma  1.  Let  f  be  &  differenthble  function  on  (l,oo)  such  that  /'(x)  =  0(z®) 
on  [l,oo).  Then 

«~**^^^”V{*)(l  +  0(n- */’+*)), 

*>»  *>i 

for  any  £  >  0.  (The  constant  implied  by  O  does  not  depend  on  e.) 

Proof:  This  follows  from  equation  (18)  with  a  proper  choice  of  £,  and  the  fact  the 
the  sum  of  the  terms  corresponding  to  A:  >  is  exponentially  small.  | 

The  next  step  in  our  approximation  is  to  convert  the  sum  to  an  integral  using 
Euler’s  summation  formula.  To  evaluate  the  error  term,  we  shall  need  the  following 

Lemma  2.  As  n  —*  oo, 

foo  f  n(^+*>/’r((/3  +  l)/2)/2  +  0(n-(^+»)/2),  if0>-l; 

e-*  /’•x^<fx=  I  (Inn)/2-Tf/2  +  0(ln(n)/n),  =  -1; 

^  0(1),  ifp  <  _i. 

Proof:  For  0  >  -Ihy  making  the  substitution  y  *-  x^/n  we  obtain 

f®®  »/  «  n(^+*)/2  /■«» 

/  e-**/n^^dz  =  ^^— -  /  e-Vy(^-0/2rfy 

2 


2  Li/n 
n(/9+i)/2 


Jo 
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which  proves  the  first  case  of  the  lemma. 

For  the  case  ==  —  1,  we  use  the  same  substitution,  y  ^  r^/n,  and  integration 
by  parts,  to  get 

/•~  -.Vn  1  .  e-vinyr  .  1  ,-v 


r  dx=l  f  —dy  = 

Jt  2  y, /n  y 


^Jl/n  y 

e-'/^lnn  1  /■* 


'Iny  ”  1  P  -V,  . 

- —  +  -  /  e  >'lnyify 

2  i/n  2  /i/n 


«1nn  1  1  f 

- - +  -y^  e-^lnydy--J^  e'^lnydy. 


But  it  is  known  that 


/  e“‘'lny<iy  = -7, 

Jo 


where  7  is  Euler’s  constant  (0.5772  .. . ),  and 


Hence 


lBy.£ 


(v) 


e“*  /"z"*  dz  = 


«-*/"lnn  7 


Finally,  if  <  —1,  then 


/OO  i“00  ^  — 

«-* /’*z^dz<e-*/"y  z^dz= =0(1). 


We  can  now  prove  the  main  result  of  thb  section. 

Theorem  3.  Let  f  be  h  differentiable  function  on  [l,oo)  sveh  that  /'(x)  =  0(z**) 
on  (l.oo).  Then,  for  any  positive  e,  as  n  -*  00, 


/„(a+l)/a^  ifa>-l; 

si^)  —  S  ifa=  —1; 

V  1,  if  a<  —1. 


where 
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Proof:  By  Euler’s  summation  formula,  if  ft  is  differentiable,  then 

I  oo  >oo 


. foo  Y  ^  /oo 

ft(ft)  =  /  ft(z)  dx  -  -ft(i)  +  /  (i  -  [ij  -  l/2)ft'(i)  dz 

k>l  •'*  ^  I  Jl 

=  I"  ft(z)  dz  -  ift(z)  ^  +  O  |ft'(z)|  dz)  . 

If  we  set  Ii{i)  ♦-  e”**^t^'*I/(i)  then  as  n  -*  oo, 

=  0(1). 


and 

Hence 


/»'(x)  =  (-^/{x)  +  r(z))  e-*’/(2n)^ 


We  need  now  to  evaluate  on  a  case  by  case  basis  the  integrals 

^  =  ^  |x/(x)|dz, 

”  di 


and 


If  Qt  >  — 1  then  x/(z)  =  0(1®"*"*),  and  by  the  first  case  of  Lemma  2  we  have 
A  =  0(nI®+3)/2)/n  =  0(n(®+‘)/2)  and  B  =  0(n(®+‘)/2). 

If  a  =  —1,  then  x/(x)  =  O(zlnx)  and 

A  =  O^^J  e~**/^*'*)x  In  X dz^  ; 

Integrating  by  parts  and  using  the  second  case  of  Lemma  2,  we  obtain 
e-**/(2»»)a;lnxdx  =  n^  e“**^^’’*^x“*  dx  =  O(nbn), 
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so  that  A  =  0(ln  n).  Also,  directly  from  the  second  case  of  Lemma  2,  B  =  0{ln  n). 

If  a  <  -1  then  xl{x)  =  0(x)  and  by  the  first  case  of  Lemma  2,  A  =  0{n)/n  — 
0(1).  By  the  third  case,  B  =  0(1). 

We  conclyde  that 

Yl  f{k)  =  n  e-**/(^'‘V(a:)  dx  +  0{g{n)), 

k>i  •'» 

where  y(n)  has  the  desired  form,  and  the  theorem  is  proved  by  applying  Lemma  1 
to  the  last  equation.  B 

As  an  example  we  can  compute,  for  s  >  0,  the  Q-scries  k*n-/n^.  In 

this  case  a  =  s  —  1  and  we  obtain  using  the  first  case  of  Lemma  2  ~ 

^  dxj  (l  +  0(ft-^/^+‘))  +  0(n*/^) 

=  +  l)/2)  +  0(n*/2+«). 

In  particular  if  s  is  an  odd  integer,  the  result  is 

2(.-i)/2„(.+i)/2((3  _  i)/2)!  +  0(n*/*+*)  =  (s  -  l)!!n<*+*)/^  +  0(n*/2+*), 
and  if  s  is  an  even  integer,  the  result  is 

=  (s  -  l)!!n(*+^)/*y!  +  0(n'/-+*), 
in  agreement  with  equations  (8)  and  (9). 

2.1.4.  Abelian  identities 
Sums  of  the  type 

An(x,!/;p,q)  =  5^  ^J^^(x  +  A;)*+'’(y  +  n- *)"■*+»,  p,g,n  integers. 


are  called  "Abelian  binomial  sums”  by  Riordan  [Riordan68],  |Riordan69]. 
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With  thi3  notation,  the  famous  Abel  identity  [Abell826]  becomes 


(x  +  y  +  n)” 


This  is  sometimes  written  as  an  identity  in  three  variables 


?{:)' 


X  +  kz)’‘~^{y  +  (n  -  A:)x) 


i-k  (x  +  y  +  nx)' 


(2.19) 


(2.20) 


via  the  substitutions  x  ♦-  x/x,  and  y  ♦-  y/x.  Equation  (19)  can  also  be  written  as 


■■*£(:) 


x(x  +  ik)*-*(n  -  ik)"-*  =  (x  +  n)’», 


which,  after  taking  derivatives  with  respect  to  x  and  setting  i  ♦-  0,  becomes 


(2.21) 


Another  well  known  example  of  an  identity  involving  Abelian  sums  is  the 
Cauchy  formula  [Cauchyl826) 

^«(®.y;0,0)  =  ^  (”)(i  +  A:)‘(y  +  n-k)'*-*  =  ^  +  y (2.22) 

which  for  X  =  y  =  0  results  in 

S  "  *)""*  =  E  (*)*•'"’*“*  =  «"(<?(«)  +  !)•  (2.23) 

Riordan  found  a  recurrence  and  a  symmetry  formula  for  An  and  used  them  to 
prove  these  identities  and  also  to  derive  similar  ** Abelian  identities”  iteratively  for 
p  and  q  between  —3  and  3*  Another  proof  method,  due  to  Fran^on  {Fran{on74],  is 
based  on  the  FF-encoding  applied  to  a  suitably  chosen  family  of  mappings.  In  this 
manner  Fr2Ln^on  proved  the  Abel  identity  and  the  Cauchy  formula  by  counting 
arguments.  The  author  obtained  a  general  explicit  expression  for  the  Abelian 
identities  ([Broder83]),  for  ail  p,  j  >  0,  using  similar  word  counting  arguments.  For 
X  =  y  =  0  the  general  identity  is 
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In  particular 


Since 


k>l  ^  k>0 

(The  penultimate  step  is  based  on  equation  (5).) 

Another  Abelian  type  identity  that  we  shall  need  is 

,k 


I  >  0. 


(2.26) 


(This  and  similar  identities  are  proved  in  [Broder84b].) 

2.2.  The  distribution  of  the  number  of  cyclic  elements 


We  have  seen  that  the  probability  distribution  of  many  of  the  random  variables  of 
interest  is  closely  related  to  the  distribution  of  the  number  of  cyclic  elements,  r; 
hence,  we  start  by  computing  thb  distribution. 

Lemma  4.  Given  a  uniform  probability  distribution  F[n], 

_  ,  kn- 

Pr(r  =  k)  =  ^. 

Proof:  Recall  that  the  repetition  index  t/  of  a  mapping  /  is  the  maximum  number  t 
such  that  /(I),  /(2), . . . ,  /(t)  are  distinct.  Clearly,  if  all  functions  are  equally  likely, 
then 

_  i.\  _  n(n-l)...(n-k  +  l)fc 
Pr(i/  —  k)—  , 

and  by  Theorem  1.13  this  is  the  same  as  the  probability  that  r  =  k.  | 

We  immediately  obtain 
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Theorem  5.  Given  a  uniform  probability  distribution  F[n),  the  probability  gen- 
crating  function  of  the  number  of  cyclic  elements  is 


Note  that  (equation  (6))  we  indeed  have 

5^Pr(r  =  ik)  =  =  1. 


(2.27) 


The  generating  function  C{z)  has  a  nice  form  as  a  Q-series: 

=  ~Qn(2.22’,32®,...)  =  -Qn{z,z^  -  2,2^  -  2*,...) 

n  fi 

=  (2:  —  +  1. 

The  factorial  moments  of  r  can  be  expressed  in  terms  of  the  Q-series  as 

The  same  identity  can  be  derived  as  follows.  Recall  that 

C(w  +  1)  =  Cq  +  ivC\  +  ty^C*3  +  •  •  •  j 

where  Cj  =  From  equation  (28)  we  have 

C{w  +  1)  =  u;Qn(l,  ti;  +  1,  (tu  +  1)^ . . .)  +  1, 

and  therefore 


(2.28) 


(2.29) 


(2.30) 
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from  whirh  equation  (29)  follows  iminediately. 

For  fixed  /,  using  the  asymptotics  of  the  Q-series  (equations  (7),  (8),  and  (9)), 


we  obtain 


(2.31) 


n 


In  particular 


c'(l)  =  g(n), 

C"(l)  =  2n  -  2g(n), 

C'"(l)  =  3ng(n)  -  9n  +  6g(n). 


(2.32) 


From  here 
and 


E(r)  =  C'(l)  =  g(n),  (2.33) 

var(r)  =  C"(l)  +  C'(l)  -  (C'(l))’  =  2n  -  Q(n)  -  Q(n)^ 


(2.34) 


2.3.  The  distribution  of  A  and  /i 


Theorem  6,  Given  a  uniform  probability  distribution  on  J’[nl,  the  length  of  the 
period  and  the  length  of  the  tail  from  a  starting  point  chosen  uniformly  at  random 
satisfy  ^ 

^(A  =  »)  =  ^ES*  l<t<n; 


n  "  n" 
k>% 


=  0  =  1  53  0  <  I  <  n. 


n  “  n 
*>• 


Proof:  The  first  relation  follows  immediately  from  Lemma  4  and  Lemma  1.6;  for 
the  second  we  ixse  Corollairy  1.7  and  equation  (12)  to  obtain 

rrU  =  0  =  iV=^fl-i)  =  i5;=J. 

"fe'*  v 

the  case  >  =  0  follows  from  Theorem  1.8.  | 
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Corollary  7.  Given  a  uniform  probability  distribution  on  f[n],  the  probability 
generating  functions  for  length  of  the  period  and  the  length  of  the  tail  from  a 
staffing  point  chosen  uniformly  at  random  satisfy 

L{z)  =  zM{z). 


The  factorial  moments  of  A  were  computed  via  equation  (1.5),  using  a  com¬ 
puterized  formal  manipulation  system  (MAPLE).  The  first  few  are  listed  below. 


Q{n)  +  1 


From  here 


var(A) 


l"(l)  =  2n  -  Q(n)  -  1 

L"'(l)  =  3nQ(n)  -  7n  +  2Q(n)  +2 
4 


=  -  3Q(n)^  -  4Q(n)  -  1 

12 

(l6-3ff)n  1  /jm  tt 


(2.35) 


(2.36) 


(2.37) 


Since  L(z)  =  zM(z),  the  distribution  of  /x  is  a  shift  of  the  distribution  of  A; 
we  have 

EM  =  EW  -  ,  =  +  0(„-),  (2.38) 


var(A()  =  vaf  (A)  = 


and  the  higher  order  central  moments  are  also  equal. 

2.4.  The  distribution  of  p 

In  the  case  of  a  imiform  distribution  on  F’[n],  the  distribution  of  />  is  easy  to 
compute  by  direct  arguments.  Consider  the  sequence  z,/(x),/*(i),...,  for  some 
fixed  I.  The  probability  of  k  distinct  values  in  this  sequence  is  clearly 

•pj,/ -  _  _  ^(*^  ~  1) .  • .  (n  —  fc  -j-  l)fc  _  kn— 

Pr(p  -  *)  -  j  j , 


(2.40) 
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because  all  possible  sequences  are  equally  likely. 

Of  course,  we  can  also  compute  it  from  Theorem  1.16.  We  obtain 

Pr(,  =  C)  =  (l  -  —  j  ^ 

Using  formula  (12)  for  the  sum,  and  expressing  falling  power.,  as  binomial  coeffi¬ 
cients  we  get 

If  we  compare  this  result  with  Lemma  4  we  see  that  for  a  uniform  distribution 
on  F[n]  the  distributions  of  r  and  of  p  are  identical.  Hence,  from  equations  (33) 
and  (34),  we  have 

E(p)  -  Q(n),  (2.41) 


var(p)  =  2n  —  Q(n)*  —  <3(n). 


From  equation  (1.26)  it  follows  that 


cov(A,//)  = 


4n  —  SQ(n)^  —  2Q{n)  +  1 


48-9.r 

24  ^  432  ’’ 

-  8»  -  ZQW  -  4(?(n)  -  1 
3}r  — 8  4(37r  — 8)  flT 

“  3jr  -  16  “  (37r  -  16)^  V  ^  ^ 

«  -0.21668895...  . 


(2.42) 


(2.43) 


(2.44) 


This  is  quite  a  strong  negative  correlation.  It  meeuis  thAt  if  we  chose  a  function  / 
and  a  point  z,  both  uniformly  at  random,  then  if  A(z,  /)  is  large,  it  is  very  probable 
that  n[x,  f)  is  small. 
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2.5.  The  distribution  of  the  number  of  cycles 

The  probability  generating  function  for  the  number  of  cycles,  /?,  is 

t  ik  ^ 


(2.45) 


from  Lemma  4  and  equations  (1.15).  Its  first  factorial  moments  are  obtained  from 
equations  (1.19)  and  (1.20),  via  formula  (5): 


«'(!)  = = 


(2.46) 


^  E  -  "fw) 

k>l 


-m 


(H,  -  +  ffk-i)  -  =  E  ^ 


^2Hk-i 

k 


(2.47) 


The  probability  that  a  function  chosen  uniformly  at  random  over  F{n)  has  a 
connected  graph  is  (Corollary  1.12) 


"  *>  1 "" 


(2.48) 


This  is  one  of  the  earliest  results  about  random  mappings,  due  to  Katz  [Katz55]. 

To  get  the  asymptotics  of  B'{1)  (i.e.,  the  expected  number  of  cycles)  we  use 
Theorem  3  with  o  =  — 2  and  Lemma  2  with  0  =  —1.  We  obtain 


«-(')  =  E  ^ ^  ^  (■  -  ))  ^  0(1) 

=  (Ln(2n)/2  +  0(1))  (l  +  0(n-»/=*+‘))  +  0(1)  =  +  0(1). 


(2.49) 


The  average  number  of  cycles  in  a  random  mapping  was  first  computed  (by 
very  different  methods)  by  A.  Kruskal  [Kruskal54|.  He  obtained  the  more  precise 
estimate 


In2n  “7  ,  , 


(2.50) 
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The  asymptotics  of  are  somewhat  more  difficult.  We  first  replace 

in  equation  (47)  by  In  A:  +  ^7  4-  0{l/k),  and  then  use  equation  (49)  and  Theorem  3 
to  obtain 


«#/  /  V  ^  n-  In  /c  ^  n-  1  ^  n~  1  \ 


-  v"'  n-  Infc  , 


(2.51) 


*>1 


Because  (Inx/i)'  =  (1  -lnx)/z^,  we  can  use  Theorem  3  with,  say,  a  =  -1.99, 
to  get 

^  X  =  (/”  X  4  +0(1)-  (2-52) 


With  the  substitution  y  *-  x^j{2n)  the  integral  becomes 

.-ulny^fiy 


re-.v(=~)!!i£*,=‘ r 

I  2  yi/(2„) 

oo 


dy 


4/(2n)  y 


(2.53) 


=  —<iy  +  \r 

4  yi/(2n)  y  4  yiy 

We  already  computed  the  first  integral  in  the  proof  of  Lemma  2.  We  obtained 

(2.54) 


/  - dy  =  ln2n  —  7  +  O  ( - J. 

yi/(2n)  y  V  »  / 


'l/(2n)  y 

We  can  integrate  the  second  integral  by  parts: 

j.._  g~*'(iny)^ 

'l/(2n) 


Ji/(2n)  y 


l/(2n) 


+  e“‘'(lny)’dy 

n)  ^j\/{2n) 

=  ^  0(,,  1  r 

3n 

(In  2n)* 


(2.55) 


+  0(1), 


because 


0<  f  e  ‘'(lny)*dy<  f  e~''(lny)®dy 
Jl/{2n)  Jo 

<^*e-‘'(lny)^dy  + (lny)*dy  =  0(1). 
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Figure  2.1.  An  encoding  example. 

It  is  obvious  that  the  encoding  is  1-1.  Hence  it  suffices  to  count  how  mamy 
possible  legal  words  exist.  First  we  remark  that 

1.  The  length  of  each  word  is  exactly  s  +  k. 

2.  Within  the  first  s  +  k  —  1  positions  each  of  the  k  letters  appears  at  least  once, 
and  the  first  s  positions  contain  the  letters  oT, . . . ,  37  in  that  order. 

To  construct  a  legal  word: 

•  Partition  the  first  k  +  s  —  1  positions  into  k  non-empty  subsets  such  that 

positions  1, . . . ,  s  are  in  different  subsets.  (Each  subset  corresponds  to  a  certain 
letter.)  This  can  be  done  in  ways. 

•  Associate  to  the  subsets  containing  the  positions  1, . . . , s  the  letters  o7, . . .  ,o7 
in  this  order.  To  each  of  the  remaining  k  ~  s  subsets  associate  one  of  the 
remaining  k  ~  a  letters,  ({k  —  a)!  ways) 

•  Choose  any  letter  for  the  last  position,  (k  possibilities) 

From  this  construction  it  follows  that 

|ft(s.‘)l  =  .  (2.59) 

and  therefore,  by  equation  (58), 

|F.(S.t)l=  .  (2.60) 
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?nd  finally 


=  =  =  +  (2.CI) 


As  a  quick  check,  note  that  by  equation  (10),  wc  indeed  have 


(2.62) 


Of  more  interest  is  the  expected  value  of  the  size  of  the  transitive  closure,  that 
is  the  sum 

For  fixed  s,  combining  equations  (1),  (8),  and  (9),  we  obtain  the  estimate 

i  E  "  r '}.  =  E 

(2.64) 


Similarly 


(2.65) 


n 


•+i 


(25)1! 


ni  (2s -2)!! 

This  implies  that,  for  fixed  s,  we  have 


:^  +  0(v^)=2sn-h0(v/5). 


var(x(s))  =  2sn  -  Y  +  0{y/ii).  (2.66) 

For  small  values  of  s  it  is  possible  to  express  f*^*”*},  a*  a  polynomial  in  k  and 
use  equation  (7)  to  compute  the  exact  values  of  the  moments  of  x(s).  The  simplest 
way  to  find  these  expansions  is  to  start  from  the  generating  function  ({Broder84a]) 


^  f  »  +  r  1  ^  e'*{e*  -  1)"» 

^\m  +  rh.!  m!  ’ 


m>0. 
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After  simple  transforr  'lions  (i  i  -f  m;  m  ^  m  -  r)  we  obtain 


^l  ^  Jr(»  +  "i-'’)'  ”  (m-r)! 


m  >  r, 


so  that 


au  viiaw 

/’^  +  *1  /  ^W  iv  -  I)'""’”  ,  V 

I  m  =  +  —•■>-/ - .  m>r,  (2.67) 

where  the  notation  {z')  G{z)  means  the  coefficient  of  z*  in  the  power  series  expan¬ 
sion  of  C(«)  around  2  =  0. 

The  last  expression  can  be  easily  computed  within  a  formal  manipulation 
system  by  taking  the  ith  derivative  with  respect  to  z.  Both  m  and  r  can  be  left  as 
symbolic  variables. 

In  our  case  we  need  to  find  We  obtain 


|ik  +  s 
[  A: 


=  *>,>0. 


and  in  fact  this  formula  holds  for  any  Jk  >  1  because  if  1  <  i  <  s  then  (Jk  -  1)^ 
equals  0,  and  so  does 

The  first  expansions  are 


m,-' 

(*: '),-!*■ -i-- 


(2.68) 


and  from  here,  via  equations  (63)  and  (65), 


E(/c(l))  =  Q(n); 
var(K(l))  =  2n  -  Q(n)2  -  (3(n); 


(2.69) 


3nQ(ri)  —  n  —  2Q{n)  3  /xn 


2(n-l) 


3  firn  ,  5  flT 


(2.70) 
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E{«(3)) 


_  45n2C?(n)  -  25n*  -  lCenQ{n)  +  38r  +  12Q{n) 
24(n- l)(n-2) 

15  fim  5  131  fir 

=  tvt'3  +  ^V2;;+°("  >■ 


(2.71) 


/  {768-  225jT)n  5  fwn  (17408  -  5895jr)  „ 

■W'))  =  -  8 VT  +  ^ - 2554 - -  * °<"  )■ 


If  both  4  arid  n  go  to  infinity  such  that  s  =  o{n)  it  can  be  shown  (Pittel83) 


2.6.2.  Random  starting  points 


Now  let’s  assume  that  instead  of  taking  the  transitive  closure  of  a  fixed  set  of 
size  3,  we  start  iterating  a  function  /  from  a  starting  points  chosen  uniformly  at 
random,  where,  as  before,  /  is  chosen  uniformly  at  random  over  F[n).  The  size 
of  the  transitive  closure  of  a  points  so  chosen  is  now  denoted  k(s).  This  notation 
is  inspired  by  the  fact  that  the  actual  number  of  distinct  starting  points  is  only 
approximately  s,  and  in  fact  it  is  a  random  variable  between  1  and  s. 

Let  the  chosen  points  be  the  sequence  S  =  (01,03,..., o«).  We  encode  each 
function  /  €  F{n]  as  a  string  of  length  n  +  a  over  the  alphabet  {I,...,n}  of  the 

form _ _ 

/^/0(a,)/»(oi).../‘*(o,) 

.../°(o,)  /*(o,)  .../*•  (o,) 

7(M7(M.... 

where  ly  is  the  smallest  iterate  of  /  such  that  /*(oy)  already  appears  in  the  string, 
and  where  61,63,...  are  the  elements  of  the  set  n}  —  f*{S),  in  increasing 

order. 

For  example,  the  encoding  of  the  function  in  Figure  1,  with 

(01,03,03,04)  =  (3, 1,3,6), 
is 

/♦-♦3T0749l0l936l52T 
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Tlie  resulting  encoding  can  be  inverted,  reconstructing  the  function  and  the 
starting  sequence. 

Assume  that  i/*(5)|  =  k.  That  means  that,  in  the  encoding,  the  length  of 
the  prefix  /°(ai) . . . /••(a,)  is  exactly  k  +  s.  The  last  letter  must  appear  at  least 
twrice.  Preceding  it  there  is  an  arbitrary  string  of  length  k  s  —  1  over  an  n  letter 
alphabet,  containing  k  distinct  letters.  Therefore  we  can  construct  a  legal  prefix 
as  follows: 

•  Partition  the  first  fc  +  s  —  1  positions  into  k  non-empty  subsets.  (Each  subset 
corresponds  to  a  certain  letter.)  This  can  be  done  in  {*^*~*}  ways. 

•  Choose  k  letters  and  associate  each  of  them  to  a  certain  subset.  (There  are 
{1)kl  possibilities.) 

•  Choose  any  letter  already  chosen  for  the  last  position  {k  possibilities). 

Once  the  prefix  is  fixed,  it  can  be  completed  in  n""*  ways  to  form  a  legal 
encoding.  Hence  the  probability  of  reaching  k  points  from  s  random  starting  points 


and  the  average  number  of  reached  points  is 


E(<(J))  =  ^:  E 

n<  n*  I 


k  +  s 
k 


(2.73) 


(2.74) 


Below  are  the  expected  value  and  the  variance  of  k(s)  for  small  s,  computed 
as  explained  in  the  previous  subsection. 


E(«(i))  =  Qin); 
var(K(i))  =  2n  -  Q(n)^  -  g(n); 


(2.75) 


K«(2)) 


_  16n  -  9Q(n)^  -  8Q(n)  +  1 


(2.76) 


^  (32  -  9x) 
8 


n  1  fitn 

~~2^Y 


+  — 75— +  C>(n-»/2); 


/ 
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E(k(3))  1=  ‘*5ng(rx)  -25n  +  2g(n)4-2 


15  [Tin  5  23  fir 


/  /:;\\  (768  —  225n-)n  5  frrn  3584  -  IOSSjt  _i/j. 

var{«(3))  =  '■■--a . '■  -  i Vt  +  2304  +  <’<’*  '  >• 


(2.77) 


2.6.3.  Digression  -  a  combinatorial  identity 


Let  S  be  a  set  of  elements,  each  chosen  uniformly  at  random.  Another  way  to 
derive  Pr(K(5)  =  fc)  is  to  start  from 


Pr(/c(S)  =  A;)  =  51  Pr(|5|  =  t)  Pr(K(0  =  k) 

l<i<$ 


Using  now  equations  (61)  and  (73),  we  obtain 


ln^./Ar  +  s-l\  1  + ~ 


and  therefore  we  have  the  identity 


Can  we  prove  it  by  less  intricate  methods?  The  answer  is  yes  and  in  fact  we 
can  prove  a  more  general  case. 


Theorem  8. 


Proof:  Consider  the  partitions  of  m  white  balls  and  s  red  bails  into  k  non¬ 

empty  subsets.  We  can  construct  them  by  first  partitioning  the  a  red  balls  into  t 
non-empty  subsets,  and  clumping  each  subset  into  a  big  red  ball;  then  we  add  the 
big  red  balb  to  the  white  balls  and  partition  ail  of  them  into  k  non-empty  subsets, 
taking  care  to  keep  the  big  red  balls  in  separate  subsets.  | 
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2.7.  The  number  of  ancestors  of  one  point 

VV'e  say  that  y  is  an  ancestor  of  i  in  /,  if  there  exbts  an  t  >  0  such  that  /*(y)  =  x. 
The  number  of  ancestors  of  a  given  point  is  denoted  a.  In  this  section  we  shall 
compute  the  probability  distribution  of  a  assuming  that  all  functions  in  F[n)  are 
equally  likely. 

Let  A  be  the  set  of  ancestors  of  a  fixed  point  x.  Any  mapping  such  that  x  has 
exactly  k  ancestors  can  be  constructed  as  follows: 

•  Choose  the  A:  -  1  elements  in  A  -  {x}.  (There  are  possibilities.) 

•  Using  ail  the  elements  of  A,  construct  a  labelled  tree  rooted  at  x  possi* 

bilities). 

•  Choose  a  random  function  on  the  elements  not  in  A.  (There  are  (n  -  *)*»“* 
possibilities.) 

•  Choose  some  value  for  /(x)  (n  possibilities) 

There  are  (fcZ})nA:*~*(n  -  ways  to  carry  out  this  construction,  and 

therefore 

Pr(a  =  fc)  =  A:)"-*,  A;  >  0.  (2.78) 

It  is  reassuring  to  notice  that  by  equation  (21)  we  indeed  have 

Remark  that  the  probability  that  all  the  points  are  ancestors  of  a 

certain  element  x  is  just  1/n,  which  b  the  same  as  the  probability  that  x  b  a  fixed 
point  (see  Corollary  1.5).  Thb  suggests  looking  for  a  bijection  between  functions 
where  x  has  n  ancestors  and  functions  where  x  b  a  fixed  point.  Here  b  one 
possibility.  Assume  that  x  has  n  ancestors  in  the  graph  of  a  certain  function  /, 
which  means  that  /  has  just  one  cycle  and  x  is  included  in  it.  Suppose  that  the 
cycle  has  the  form  x  — ►  oi  02  —»•••—>  a*  x.  To  the  function  /  we  associate 
a  function  g,  identical  to  /  except  for  the  points  aj , . . . ,  a*  and  x.  The  cycles  of  g 
are  built  by  making  x  a  fixed  point,  then  splitting  the  string  aia2  "  •  dk  whenever 
a  new  maximum  is  encountered,  and  then  considering  each  substring  as  a  cycle. 
Thb  correspondence  can  be  reversed  and  therefore  defines  a  bijection.  In  fact,  the 
same  idea  works  for  any  permutation  invariant  weight,  and  therefore  we  have 

Theorem  9.  For  any  permutation  invariant  weight  the  probability  that  all  the 
elements  are  ancestors  of  an  element  chosen  uniformly  at  random  is  1/n.  | 
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From  equations  (78)  and  (23)  we  obtain  that 


=(“)  =  :^T,  (*)**(" - *)’■■*  =  <3(").  (2-73) 


E  = E  ^{*r} 


En-  k{k  +  1)  _  nQ{n)  +  n 
n*  2  ~  ^  ' 


(2.80) 


From  here 


.(a)  =  »^(»)  +  -  2<?(n)^ 


(2.81) 


The  expected  value  for  o:  is  not  unexpected;  it  can  also  be  argued  as  follows. 
Recall  that  p(x)  is  the  number  of  descendants  of  x.  For  any  function  /  whenever 
y  is  a  descendemt  of  x,  £  is  an  ancestor  of  y.  Hence 

*  y 

ajid  therefore,  for  any  weight  distribution,  u;, 

ew  =  E^Eo(-./) 

/  * 

=  E=^E“(-./)=e(<.). 

/  * 

In  particular  for  the  tmiform  distribution  E(p)  =  Q{n). 

2.8.  The  number  of  ancestors  of  a  set  of  points 

Let  now  A  be  the  set  of  ancestors  of  a  fixed  set  of  s  elements,  S  =  {ai ,  aa,  • . . , a*}, 
with  Or  <  02  <  •••  <  o,.  Let  a(s)  be  the  size  of  A.  We  want  to  determine  the 
probability  distribution  of  a(s).  An  arbitrary  mapping  such  that  |A|  =  k  can  be 
constructed  as  follows: 

•  Choose  the  k  —  s  elements  in  A  —  5.  (There  are  possibilities.) 

•  Assume  for  the  time  being  that  /  is  such  that  oi  —»  02  o. 

Choose  for  the  k  elements  in  A,  a  random  mapping  having  exactly  the  cycle 


I 
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given  above.  It  is  easy  to  show  via  the  FF-encoding  that  there  are  sJb*”*“* 
such  mappings. 

•  Choose  a  random  mapping  for  the  n  -  k  elements  not  in  A.  (There  are  (n  - 
A:)"”*  possibilities.) 

•  Assign  arbitrary  \'alues  to  /  at  the  points  01,03,.  - .  ,0,  (n*  possibilities). 
From  this  construction  it  follows  that 


Pr(«W  =  t)  =  i  (n  - 


(2.82) 


As  a  quick  check,  note  that 


from  Abel’s  identity  (equation  (20))  when  i «-  s,  n  <-  n  -  s,  and  y  0. 

For  the  moments  of  the  distribution  of  a  we  compute 

^Pr(a(s)  =  k)k^  =  ^  -  k)""* 

k  // 

by  equation  (26).  For  fixed  s  and  fixed  /  >  1  as  n  — ►  00  this  is  (equations  (1),  (8), 
and  (9)) 


— W^-  '  ' 

3(2i  —  3)!!n*~*  .  .  j 


(2.84) 
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We  can  express  the  first  momcats  as  a  function  of  Q(n),  by  expanding  the 
r-Stirling  number  as  a  polynomial  in  k  and  adding  and  subtracting  the  missing 
terms.  We  obtain  that 


k>*  '  l<k<«  ' 


=  ^  (0(")  -  (*  -  ‘)(» +  o("‘‘)))  =  +  T  - 

(2.85) 

and  in  a  similar  manner 


„/  ,  .jx  an*  n-  f/:  + 11  sn  fnn  an  /-\ 

E(aW’)  =  ^|:;^{  t  |.  =  tVT  +  T  +  °(^)- 

From  the  last  two  equations,  with  the  help  of  a  computer,  we  obtain 

For  small  a  we  can  get  nicer  formulae: 

E(o(l))  =  Q(n), 


var(a(l))  = 


nQ(n)  +  n  -  2Q(n)^ 


A 

n  fitn  (2  —  3^)71  17  flT 


E(a(2))  = 

^  8  13  fir 

=  VT-3  +  TVi^  +  °<'>  >■ 


/  /.\\  n^Q(n)  +  n®  -  4n*Q(7i)^  +  5n^Q(n)  —  5n’ +  2n(3(n) 
var(o(l))  = - 5-^ -  - 

fnn  (2  —  6n)n  39  /IT 

,  X  _  3n^(g(n)  -  6n^  +  3n 
E(a{3))- 


(2.86) 


(2.87) 


(2.88) 


(2.89) 


(2.90) 


Chapter  S 

Pollard’s  factorization  method 


Let  p  be  a  factor  of  a  large  integer  N.  Pollard’s  Monte  Carlo  factorization 
algorithm  ([Pollard75],[Brent80))  finds  p  in  average  time  0{^).  Pollard  suggested 
a  tuning  of  his  method  for  the  case  when  a  nontrivial  factor  d  of  p  -  1  is  known, 
and  conjectured  that  its  running  time  is  0{y/^).  Pollard  and  Brent  (BPSl) 
used  this  improved  method  to  factorize  the  eighth  Fermat  number,  2”6  +  i,  using 
d  —  1024,  and  recently  Gold  and  Sattler  [GS83]  ran  a  series  of  empirical  tests  that 
agree  with  Pollard’s  conjecture.  Of  course,  in  general,  no  such  d  is  known,  but  the 
improvement  is  relevant  whenever  p  -  1  has  small  factors,  whether  they  are  known 
or  not  (GS83]. 

Pollard’s  method  is  quite  important  in  practice  because  although  there  are 
several  factorization  algorithms  that  are  asymptotically  faster,  they  do  not  take 
advantage  of  the  existence  of  small  factors.  (See  (Pomerance82]  for  a  survey;  the 
best  Client  bound  is  0(exp(>/lniVlnlniV)),  due  to  Schnorr  and  Lenstra  (SL84  .) 
Hence  it  is  preferable  to  use  Pollard’s  method  first,  to  isolate  the  small  factors,  and 
then  to  switch  to  a  more  sophisticated  method.  Another  advantage  of  this  method 
is  that  it  is  extremely  simple  and  can  be  implemented  even  on  a  hand  calculator. 
It  is  also  possible  to  have  a  large  number  of  simple  processors  running  Pollard’s 
algorithm  on  the  same  composite  N,  with  no  communication  among  processors. 
(The  expected  speed-up  is  discussed  in  section  6.) 

In  this  chapter,  we  shall  prove  Pollard  s  coi^ecture  under  a  certain  randomness 
model. 


/ 
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3J.  Pollard’s  factorization  method 

A  well  known  method  for  determining  A(zo,/)  and  n{xo,f),  for  a  given  zq  is  the 
following  algorithm,  due  to  Floyd  (KnuthSl,  ex.  3.1.6]: 

X  :=  xq\  y  :=  zq;  j  :=  0; 

repeat  {  now  z  =  Xj  and  y  =  zjy  } 

3  :=  3  +  1; 

l~  /(i); 

V  ■=  /(/W)l 

until  z  =  y. 

Why  does  this  work?  There  exists  a  j  >  0  such  that  Xj  —  zjy.  The  minimum 
such  value  is  j  =  f/x/A]  A,  for  n  >  0,  and  y  =  A  for  /x  =  0.  Hence  j  =  0(A  +  y). 
Knowing  j  we  can  easily  determine  A  and  n,  in  time  0(A  +  /x).  Therefore  the  whole 
algorithm  takes  time  0(A  +  /x)  =  0(p).  If  all  functions  are  equally  likely  then 
according  to  the  results  of  Chapter  2,  E(p)  =  y/irnj2  +  0(1). 

There  are  more  efficient  algorithms  for  this  problem,  based  on  storing  more 
values  of  f  in  memory  (see  [SSY82]  zmd  (Fitch82|  for  detailed  discussions).  How¬ 
ever,  the  benefits  of  the  improved  versions  are  not  directly  applicable  to  factoring 
algorithms  and  do  not  change  the  essence  of  the  analysis  below. 

Pollard’s  factorization  method  is  based  on  Floyd’s  algorithm;  /  is  chosen  to 
be  some  polynomial  P(x)  mod  N,  where  iV  is  the  number  to  be  factored.  The 
stopping  condition  is  also  modified  as  follows: 

*  :=  *o;  y  :=  ®o;  3  :=  0; 

repeat  {  now  x  =  (zq)  mod  jV  and  y  =  P^^  (xo)  mod  N  } 

3  :=  3  +  1; 
z  :=  P(z)  mod  N; 
y  :=  P(P(y))  mod  N; 
until  gcd(|z  —  y| ,  iV)  >  1. 

Assume  that  p  divides  N.  By  construction  z,+i  =  P(®»)  (modulo  N)t  there¬ 
fore  Z|+i  =  P(x,)  (modulo  p).  The  second  congruence  implies  that  for  a  certain 
3  we  have  zy  =  Z2y  (modulo  p).  At  this  point  gcd(|zy  —  zjyl  ,iVf)  is  either  N  or  a. 
proper  factor  of  N.  The  first  case  can  be  shown  to  be  improbable,  but  if  it  happens 
we  can  try  another  starting  point,  or  another  polynomial  P. 
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So  far,  we  have  not  discussed  what  polynomial  P  to  choose.  If  nothing  is 
known  about  the  factors  of  N  then  we  can  take  P(i)  =  +  c,  for  some  constant 

c  ^  0,  (see  [BP81]  and  [KnuthSl,  §4.5.4]  for  some  precautions)  but  if  a  factor  d  >  2 
of  p  -  1  is  known,  Pollard  (PollardTSj  suggests  taking  P(x)  =  x'^  +  c.  In  this  case, 
in  the  graph  of  P(x)  mod  p  all  the  indegrees  are  either  0  or  (except  for  c).  If  no 
factor  of  p  -  1  is  known,  but  p  - 1  contains  small  factors,  we  can  use  P(x)  =  x®  +  c, 
with  o  a  product  of  small  primes.  Assume  that  gcd(p-  l,o)  =  d  >  1;  then  again,  in 
the  graph  of  P(x)  mod  p  all  the  indegrees  are  either  0  or  d  (except  for  c).  Pollard 
conjectured  that  in  this  case  E(p)  =  0{>Jnl{d-  1)),  a  y/d  -  1  improvement  over 
an  arbitrary  choice  for  P(x),  at  a  cos*  of  0(!ogd)  more  operations  per  iteration. 

To  determine  the  expected  running  time  of  the  algorithm  we  must  compute 
the  expected  cycle  length,  and  the  expected  tail  length  in  such  a, mapping.  As  in 
other  analyses  of  factorization  algorithms,  we  assume  that  all  such  mappings  (that 
is,  where  all  the  indegrees  are  either  0  or  d)  are  equiprobable.  Under  this  model, 
we  shall  prove  that  Pollard’s  conjecture  is  true.  Deciding  on  the  validity  of  such  a 
model  is  beyond  the  current  state  of  knowledge  in  number  theory,  but  experimental 
results  ([BP81|,  (KnuthSl),  [GSSSj,  (PollardSSj)  seem  to  confirm  it.  - 

Variants  of  this  algorithm  (e.g.,  (BrentSOj)  depend  in  a  slightly  different  way 
on  A  and  but  their  running  time  is  still  essentially  proportional  to  p. 

3.2.  The  constant  indegree  model 


Let’s  consider  the  family  /  of  functions  /  :  {l,...,n}  {l,...,n}  such  that 

exactly  nfd  nodes  have  indegree  d.  (Here  n  represents  the  ntimber  p  —  1  in  the 
factorization  problem.)  This  family  is  not  empty  only  if  n/d  is  ajx  integer,  say  m; 
then  /  has  cardinality  (”)n!/(d!)"‘.  We  define  a  permutation  invariant  probability 
weight  as  follows:  to  each  fimction  in  T  we  assign  probability  (d!)’”/((^)n!)  and 
to  all  other  functions  /  :  {1, . . . ,  n}  ►-♦  {1, . . . ,  n}  we  assign  probability  a  (In  fact 
this  weight  is  strongly  invariant.) 


For  this  probability  weight  it  is  easy  to  see  that 

Pr(i/>ik)  =  ^ 

|7|VmyVfcy  ((d-D! 


-k)! 


((d-l)!)*(d!) 


(3.1) 


because  the  trivial  encoding  of  a  function  /  with  u  >  k  can  be  constructed  as 
follows: 


•  Choose  the  m  elements  with  indegree  d.  (There  are  (^)  possibilities.) 

•  Choose  k  elements  out  of  these  m  elements,  to  be  the  first  k  letters  of  Co{f). 
(There  are  C)^)  ways.) 
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•  Permute  the  first  k  elements  (in  one  of  kl  ways). 

•  The  remaining  n  —  k  letters  a  permutation  of  k  letters  repeated  d  —  1 
times  and  m  —  k  letters  repeated  d  times.  (Hence  the  number  of  possibilities 

Expanding  equation  (l)  we  obtain 


Pr(i/  >  k)  = 


/m\  k\{n-k)\{d\)"' 

U/ ((d-l)!)*(d!)'”"*n! 


O  -I  (">-«:)!  O 


(3.2) 


We  are  interested  in  the  moments  of  the  distribution  of  r.  To  obtain  them  we 
shall  consider  the  generating  function 


F(z)  =  Pr(r  >*)«*. 
k>0 

Clearly,  the  probability  generating  ftmetion  of  r,  C{z)^  satisfies 

CW=f(z)-i(fW-l); 

z 


hence 

C'(l)  =  F(l)  -  1, 

C"(l)  =  2F'(1)  -  2F(1)  +  2, 

C'"{1)  =  3F"(1)  -  6F'(1)  +  6F(1)  -  6, 

and  so  on.  Now  to  obtain  the  derivatives  of  F  we  first  write 


where 


f  (r)  =  a{zd)/a. 


fc>0  '  '  k<m  '  ' 


(3.3) 

(3.4) 

(3.5) 

(3.6) 

(3.7) 


This  is  related  to  the  tail  of  a  negative  binomial  distribution  that  can  be  replaced 
by  the  tails  of  a  binomial  distribution,  via  the  following  theorem. 
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Theorem  1.  Let  p  be  a  probability,  and  let  q  =  I  -  p.  U'e  have 


Proof:  VVe  may  assume  that  r  is  an  integer.  The  term  is  the  proba¬ 

bility  to  obtain  the  rth  success  in  a  sequence  of  Bernoulli  trials,  after  exactly  r  -f-  Jk 
trials.  So  that  the  left  side  is  the  probability  to  obtain  the  rth  success  in  at  most 
r  +  m  trials  =  the  probability  to  fail  at  most  m  times  in  r  +  m  trials.*  Q 


Corollary  2.  If  m  is  an  integer  and  x,  y,  r  are  arbitrary, 


k 


Proof:  Multiply  both  sides  of  the  identity  in  Theorem  1  by  (x  +  and  then 

replace  p  *-  y/(i  +  y)  and  g  x/(i  +  y).  | 

Applying  Corollary  2  to  equation  (7),  with  r+-n  —  m  +  1,  x«—  1,  y*-x  —  1 
we  get 

G(^)  =  Z;  ^  -  !)"’■*,  (3.8) 

and  hence 

=  Z  (”  jfc  “  I)”*"*"'.  (3.9) 

k<m  '  ''  / 

Now  we  use  the  expansion’ 

(m  -  fc)i  =  Q  (-l)‘(m  -  i)^k^,  (3.10) 

The  itajid&rd  proof  [Peu'sonSSj  of  thif  theorem  is  to  use  calculus  of  complex  vxrixblei  to 
equxte  both  sides  to  the  txils  of  the  Betx  distribution.  However  after  finding  this  proof,  a  careful 
search  of  the  literature  showed  that  it  was  already  published  twenty  five  years  ago  [PatilfiO]. 

Proof:  We  repeatedly  use  the  identities  =  (x  +  n  -  1)^  and  (-x)!i  =  (~l)^x"  to  obtain 

(m  -  =  (-l)'(fc  ^  m)^  =  ~  m  4-  /  »  l)i 

« 

= Z  -  '+ Ip = z  -  'p. 


00 
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to  obtain 

="’(')  =  E  (!)  (-')•(">  -  o‘=‘(-. + 1)^  E  ')  (z  - 1)"-*-' 

=  E  (')(-*)'(">  -  *  ir-£  ("  - 1)—-*-. 

(3.11) 

Applying  Corollary  2  again,  this  time  in  reverse,  yields 

g">w=e  (!)(-•)> E 
=  (Trip  E  (!)  (->)’(">  -  +  i)'-z-‘s,(z), 

'  (3.12) 

where 

5.(*)=  E  ("T*)‘”“  =  ®”W-  E  (:"**)«*■  (3.13) 

We  are  most  interested  in  when  I  b  small,  hence  the  formulae  that 

we  have  derived  are  actually  “simple"  to  compute,  in  spite  of  their  forbidding 
appearance.  We  have 

G(z)  =  So(z); 

G'fz)  = 

=  (ma-n-l)5o(s)  +  (n +!)(;;*) 
z(z  -  1) 

Q»/  \  _  rn(m  —  l)5o(z)  —  2(m  —  l)(n  -h  l)a~*5|(g)  +  n(n  4-  l)s~^52(a;) 

^  ^  (*  -  1)’ 

(3.14) 

Let  A  =  5o(d)/(;;).  Since  /•(')(!)  =  d'(7(')(d)/(«)  and  n  =  md,  we  find  that 
these  equations  simplify  considerably: 

ni)  =  A; 

n  +  l-A 


F'(l)  = 
'(1) 


d-1  ’ 

„  _  ((d~l)n  +  2d)A-2(n  +  l)d 

^  (d-l)2 


(3.15) 
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It  remains  to  determine  the  asymptotic  value  of  i4  =  G((i)/{'*).  By  equa¬ 
tion  (8)  we  can  write  this  as 

■  *  )\V  V‘“ij 


(3.16) 


The  sum  is  1/2  +  0(n  by  the  central  limit  theorem,  since  it  is  the  sum  of  all 
probabilities  tha»  are  less  than  the  mean  value  {rt-l-l)/dof  a  binomial  distribution. 
And  the  leading  coefficient  is  easy  to  evaluate  by  Stirling’s  approximation: 


1  rfmd+l 


y/2irmm"'\/27Tm{d  -  1)  m"*^**"*!  d  , 

- - J37(»  +  0(m-)) 


Hence 


2(d-l) 


+  0(1). 


(3.17) 


(3.18) 


This  analysis  is  sufficient  to  give  the  leading  term  in  our  asymptotic  formulae, 
but  it  is  somewhat  unsatisfactory  since  it  does  not  make  clear  how  we  could  obtain 
better  accuracy.  For  example,  we  might  want  to  know  the  constant  term  of  A. 
The  next  section  sharpens  the  asymptotics  by  looking  closer  at  the  left  half  of  a 
binomial  distribution. 

3.3.  Sums  of  Bernoulli  random  variables 


We  start  from  the  following  theorem  due  to  Esseen  (Esseen45l.  (See  also  |GK68l 
and  |Petrov75].) 

Theorem  3.  ff  are  independent,  identially  distributed,  random 

variables,  with  mean  0,  variance  a^,  and  finite  third  moment,  as,  such  that  the 
only  possible  values  of  are  o  +  t/h  for  ^  =  0,±l,±2,. and  k  is  maximum, 
then  the  cumulative  probability  distribution 


J"n(x)  =  Pr(E,fi./(V^o)<*) 
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satisRes 


f„(i)  =  4»(x)  + 


(1  — z^)a3  h  /  xa\/n  na 

^  a  \  h 


-m)) 


where  ^^(x)  is  the  normal  distribution, 


and  S(x)  is  a  discontinuous  function^  of  x, 

S(x)  =  fi]  -  X  -  1/2. 


We  are  interested  in  the  sum  of  n  Bernoulli  variables  Xi»X2»***tXn»  with 
Pr(xk  =  1)  =  p.  Let  ffc  =  Xfc  -  P,  and  7  =  1  -  p.  Then  E{^jk)  =  0,  var(^*)  =  pq, 
and  a3(^fc)  =  pq^  -  qp^  =  p7(7  -  p).  We  can  apply  Esseen’s  theorem  with  a  =  —p 
and  h  =  1  to  obtain 


Et 


Vv^ 


$(x)  + 


e  { (1  -  x^)(7  -  p)  ^  S{xy/n^  +  np) 


)+0(n-‘). 


(3.19) 


'  '  V  6^  y/M  ;  ■  V 

(We  are  allowed  in  this  case  to  replace  o(n~*/^)  by  0(n“*)  because  the  fourth 
moment  is  also  finite.  For  details  see  [Esseen45]  or  [PetrovTSj.) 

Making  x  =  0  and  substituting  Xk  —  P  for  ik  we  obtain  that 


=  I  *  (^"”1  - 


(3.20) 


^  The  literature  it  a  bit  confusing  with  respect  to  this  function,  so  maybe  some  clarification  is 
necessary.  In  the  original  paper  by  Esseen,  Fn(x)  is  probably  meant  to  be  Pr(y^^.  (i/i^/Ker)  <  *). 
There  is  no  definition  of  Fn(x)  but  there  Is  a  picture  that  implies  that  Fn(x)  is  continuous  from 
the  right  Esseen  uses  instead  of  5,  the  function  .?  =  [zj  x  +  1/2.  Notice  that  5(x)  =  S{x)  for 
all  non*integral  z;  at  integral  points  5(x)  is  continuous  from  the  left,  while  S(x)  is  continuous 
from  the  right.  In  [GK68|  the  cumulative  probability  distribution,  Fn(z),  is  defined  as  above,  but 
the  authors  incorrectly  use  S,  In  |Petrov75|  the  function  5  is  described  by  its  Fourier  expansion 
only.  Of  course  5  and  S  have  the  same  Fourier  expansion  . . .  and  in  fact  this  is  the  source  of 
error  in  (GK68]. 
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3.4.  Better  asymptotics 


Setting  n*-n  +  l,p*—lfd,  and  ^  1  —  1/d,  in  equation  (20)  we  can  improve  the 

estimate  of  the  sum  in  equation  (16): 


s,„("‘)(5)‘(-a . 


k<[n^l)/d 


Hence  we  obtain  a  more  precise  value  for  A: 

From  here,  using  the  equations  (15),  we  obtain 


(3.22) 


F'(l)  = 


d-l  V  2(‘i -  1)’  3(<f  -  1) 


3fd-  112 


(3.23) 


Finally,  going  back  to  the  equations  (5)  and  to  the  relevant  equations  in  Chap¬ 
ter  1,  we  obtain 


2(4-1)  3(4-1) 


+  0(n-*/»), 


...  (16  — 3jr)n  4+1  /  irn  . 

var(A)  -  —  yj 2^-3jj3  +  0(1); 


(3.24) 


(3.25) 
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(3.26) 


(3.27) 


cov(A.^)=.^y_’'>"+0(l). 


(3.28) 


These  values  confirm  the  constant  term  in  a  sharper  form  of  Pollard’s  conjec¬ 
ture  [Pollard82|.  In  principle,  smaller  order  terms  and  higher  order  moments  can 
be  computed  by  the  same  method,  using  smaller  order  terms  in  the  estimate  of  the 
tails  of  the  binomial  distribution. 


3.5.  The  case  d  =  2 


This  case  is  of  special  interest  for  two  reasons:  it  corresponds  to  the  frequently 
used  polynomial  +  c  in  Pollard’s  method  and  we  can  obtjiin  closed  form  formulae 
that  are  a  useful  check  on  the  general  caise. 

If  d  =  2,  by  equation  (13)  and  Corollary  2,  with  r  ♦-  m  + 1,  x  «—  1,  and  y  1, 
the  sums  5,-  are  given  by 

(--•V 
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From  here 


^’'(1)  =  «  -  7-;irT  + 1  =  « 

\n/2) 


+  l  +  0(n-^), 


\n/2)  \n/2) 


-4=  -  4n  +  0(n*/^). 


(We  have  used  the  expansion 

2*"*  , - A  1  1 

which  is  easy  to  compute  using  Stirling’s  formula.) 
Going  back  one  more  step,  we  finally  obtain 


22n  2" 

var(f)=2n-— j-pp-+2 

(n/a)  ^n/2) 


(4-5r)n  pen  (8  -  :r) 

=  ‘-V--VT+  4  >' 

2”“^  1  /jm  1  nr  _a/2x 


\n/2f  •  »  * 

2n  2*"-*  2’*-‘ 


2n  2*"-='  2’*-*  2 


_  (l$-3»)n  1  (32 -3»)  , 

-— ji - 2VT  +  ~4r“+°<'‘  >= 


,  2"-*  ,  2"  1 

®(m)  “/n\  n 

\n/2/  "(n/ay  ” 


_ _ _  .  (16  — 3T)n  1  Inn  (128  —  27jr) 


(3.30) 


(3.32) 


(3.33) 


(3.34) 
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E(p)  = 


-  1  + 


1 

n 


(»y  ‘  '  <n) 


var(p)  = 


(4 


K)n  Inn 

I  \Y 


(24  -  5?r) 


(3.35) 


,  n  22"-*  4  22"-» 

cov(A,/i)  =  -  -  -r— +  -  - 


2n-l  j 


\n/2J  '*Vn/2 

(8  - 3jr)n  64  -  ISjt 


24 


48 


+  0(n-») 


(3.36) 


Comparing  these  with  the  corresponding  formulae  for  the  uniform  case  (equa¬ 
tions  (2.35)  and  following)  shows  that  the  leading  terms  are  unaffected,  but  the 
next  terms  decrease  very  slightly. 


3.6.  The  parallelization  of  Pollard’s  factorization  algorithm 


Suppose  that  we  have  s  processors  simultaneously  running  Pollard’s  algorithm 
trying  to  factor  the  same  number.  What  is  the  expected  speed-up?  More  precisely, 
what  is  the  expected  value  of  the  minimum  rtmning  time  to  completion,  over  the 
s  processors?  As  a  model,  let’s  asstune  that  processor  »  computes  the  length  of 
the  period  and  of  the  cycle  of  a  remdom  function  €  Efn]  (in  fact  a  polynomial) 
starting  from  the  point  x,-.  We  need  to  compute 

E(min  (p(/i ,  xi ) ,  p(/2,  xa) , . . . ,  p(/„  X,))  ) . 


We  consider  two  cases.  The  first  case  is  that  each  processor  chooses  its  function 
uniformly  at  random  over  the  n"  possible  functions.  Then  the  following  theorem 
applies. 

Theorem  4.  Let  :  {!,..., n}  {l,...,n}  be  s  mappings  chosen 

uniformly  at  random  in  E(n].  for  any  fixed  choice  of  Xi,X2,  • . .  ,x« 

E(min(p(/i,xi),p(/2,x2),...,p(/„x,)))  = 


Chapter  S:  Pollard^a  factorization  method  6 


Proof:  From  the  results  of  Section  2.4  it  follows  that  if  fi  is  chosen  uniformly  at 
random,  then 

Pr(/)(/.,x.)  >k)=  Pr(r/(/.)  > 

Because  p(/i,Xi),p(/2,X2),. . .  ,p(/,,x,)  are  independent  random  variables,  we 
have 

Pr(min(p(/i,xi),p(/2,X2),...,p(/„x,))  >  k)  =  , 

and 

E(mm(p(/,,x,),p(/2,T2),...,p(/„x,)))  =  • 


For  a  fixed  a  and  every  it, 


©■■A(-‘)'-n.(-S-(S)) 


O) 


1!  k>n/s  then  the  term  n-/n*  is  clearly  exponentially  small  and  therefore 


+  exponentially  small  terms 


l<*<n/# 


l<k<n/s 


("/*)*  («/>)‘  »= ) 


It  is  easy  to  see  that 


< 


E 


(2Z£)f 

(»/»)* 


£  CChW). 


and  hence 


l<k<n/0 


+  0(1). 
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Hence  if  every  processor  uses  an  independently  chosen  random  function  then 
the  speed-up  is  0{y/s),  where  s  is  the  number  of  processors.  (Compare  with 
equation  (2.41).)  The  second  case  to  consider  is  that  all  processors  use  the  same 
function  but  different  starting  points.  However,  via  very  genera!  principles,  it  cam 
be  shown  that  this  strategy  is  never  better  than  the  first  strategy. 

We  need  now  to  compute 

E  (min  (/)(/,  xi),p(/,X2),...,p(/,x,))), 

where  /  is  chosen  uniformly  at  random  in  F(n]  amd  xi,X2,...,x,  axe  chosen  uni¬ 
formly  at  random  in  {1, . . . ,  n}. 

Define  b{f,k)  to  be  the  probability  that  p[f,z)  >  k  when  x  is  chosen  uniformly 
at  random.  This  means  that  n6(/,  k)  is  the  number  of  (bad!)  points  x  such  that 
p{f%x)  >  k.  Then  the  probability  that  the  second  strategy  requires  more  than  k 
steps  is 

Pr(min(p(/,xi),p(/,X2),...,/>(/,x,))  >  fc)  = 

f^Fln] 


(3.37) 
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while  the  probability  (which  we  already  computed)  that  the  first  strategy 
requires  more  than  k  steps  is 

Pr(rain(p(/i,ii),p(/2,i2),...,p(/.,x,))  >  A:)  =  (  J]]  ~A~~)  ' 


Ver(n| 

We  shall  now  show  that 

E  ^^2(E 

/eF(nl  ^/erin)  r 

regardless  of  the  actual  values  of  b{f,k),  and  therefore 

E(inin(p(/,  xi),  p{f,X2), . . .  ,p(/,x,))) 

>  E(min(p(/i,xi),p(/2,X2) . p(/«, *.))). 

We  start  from 


(3.39) 


(3.40) 


Theorem  5.  Let  xi  <  <"•<  Xm  be  m  real  points.  Any  real  function  f  such 

that  J"{x)  exists  and  f'*{x)  >  0  on  the  interval  [xi,x^)  satisfies 


£  S)- 


Proof:  See  [HLP59]  page  72.  B 

Applying  now  Theorem  5  to  the  function  /(x)  =  x*,  we  obtain  that  for  any 


,Xm<  we  have 

,£J)' 

(3.41) 

(  E  ■ 

VeFM  ' 

(3.42) 

and  in  particular 


/€f[nl  Ve/^|n| 

which  is  the  inequality  we  wanted  to  prove. 

In  conclusion,  if  s  processors  are  running  Pollard’s  algorithm  in  parallel,  they 
should  run  it  with  different  polynomials,  for  an  expected  spoed-up  of  y/s.  The 
strategy  of  using  the  saune  polynomial  and  different  starting  points  is  inferior  on 
average.  Although  the  gain  is  relatively  small  compared  with  the  number  of  pro¬ 
cessors  used,  the  parallel  version  of  Pollard’s  algorithm  might  be  a  good  choice  in 
certain  situations  (e.g.,  vector  machines)  because  no  conununication  is  required. 
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3.7.  Open  problems 

The  obvious,  but  probably  hopeless  question  is  hov/  accurately  the  model  used  here 
for  Pollard’s  method  reflects  reality. 

Another  problem,  more  amenable  to  solution,  is  to  compute  the  expected 
values  of  A  and  /i  if  every  node  has  indegree  either  a,  or  b,  or  0;  more  generally 
one  can  consider  a  given  indegree  probability  distribution,  or  other  distributions 
closely  related  to  the  polynomials  that  are  actually  used. 


{Abell826| 

[BsrgSl] 
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[Broder83] 
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Appendix  B 

A  strange  example 


This  an  example  of  a  permutation  invariant  weight  for  which  the  correlation  of  A 
and  p  is  positive.  Consider  the  following  probability  weight  on  F[n]: 

{«.  if  Coif)  “  a  permutation  of  1, 1, 1, 2, 2, 3; 

1  -  n(n  -  l)(n  -  2)c/2,  if  Coif)  =  1, 1. ....  1; 

0,  otherwise. 


Clearly  this  weight  is  permutation  invariant.  The  probability  generating  func¬ 
tion  for  the  number  of  cyclic  elements  is 

Ci»)  =  (1  -  (3»’  -  13»  +  lA)e)z  +  (Sn^  -  19n  +  32)*’  6(n  -  3)z®. 


Using  the  relevant  equations  from  Chapter  1,  it  can  be  shown  that 


...  .  29  ,  38\  .  /Sn< 


39n®  157n* 


-  37n  -t- 103  + 


Therefore  if,  say,  n  =  100  and  e  =  10~^,  then  the  covariance  is  positive.  (Namely, 
it  is  equal  to  0.000001529 . . .  .) 


